Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
85 views142 pages

RSG304 Sdai

Uploaded by

opeoluwadejumos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views142 pages

RSG304 Sdai

Uploaded by

opeoluwadejumos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 142

RSG 304:

Spatial Data Analysis & Integration

Department of Remote Sensing and Geoscience Information Systems,


Federal University of Technology Akure.

B. J. Fagbohun
2
Course Content
• Spatial Phenomenon – Geographic Object, Geographic field
(discrete and continuous)
• Spatial Data Model – Vector, Tessellation, Triangular Irregular
Network (TIN)
• Manual and Automatic Catchment Delineation (Watershed)
• Surface Analysis – Slope; Aspect; Curvature; Flow Direction, Flow
Accumulation,TWI (Using Map Algebra)
• Interpolation – Kriging; Spline; Trend Surface; Inverse Distance
Weighted (IDW) technique, Theissen Polygon, TIN.
• Classification – Reclassification, Resampling
• Proximity Analysis (Measurement) – Buffer; Euclidean Distance
• Spatial Queries – Measurement and Retrieval
• Vector Overlay – Clipping, Dissolve, Merge, Union, Intersect
• Raster Overlay – Raster Overlay using Map Algebra; Weighted
Overlay; Weighted Sum, Fuzzy Overlay
• Network Analysis – Optimal-path Finding, Network Partitioning,
Network Allocation, Trace Analysis
• Diffusion Computation, Flow Computation

3
Representation of Geographic
Phenomenon
 Geographic phenomenon are classified into two
1. Geographic (discrete) object
2. Geographic field: (a) Discrete field
(b) Continuous field

 In the spatial domain, the spatial phenomenon are


considered to occur in a two- or three-dimensional
Euclidean space.

 Euclidean space - location is represented by x, y


coordinate is used in 2D space, x, y, z is used in 3D
space. 4
Geographic object
 Geographic object: are not present everywhere but
sparsely distributed within the study area. Can be easily
distinguished and named. Two objects do not occupy the
same location. Their position in space is determine by one
or more of the following parameters:
location (where it is?)
shape (what form does it have?)
size (how big is it?)
orientation (in which direction is it facing?)
 Shape is an important component because of its factor is
dimension. this relates to whether an object is perceived
as a point or linear feature, area or volume feature

5
Geographic field
 Geographic field: has a value at every point
within the study area.
It Can be described by a mathematical function
that assigns a specific value with any position in the
study area.
 Geographic field can be discrete or continuous
field.
• Discrete field divide the study space in
mutually exclusive bounded parts with all
locations in one part having same value.
Example of discrete is rock types soil types
• In continuous field the value along a line of f ( y) 

 f ( x, y)dx
x  

path through the study area does not change


abruptly but only gradually. Change in field value
per unit distance in any direction can be
determined. Examples of continuous field
include elevation, temperature, rainfall, humidity.
6
Boundaries
 Relates to geographic object as well discrete geographic field
(phenomenon with area and size)
There are two types of boundary:
(a) crisp boundary: can be determined at arbitrary level of
precision
(b) fuzzy boundary: are themselves area of transition
 Generally crisp boundary is common associated with
geographic objects e.g. buildings
 Fuzzy boundary is common in discrete geographic fields e.g,
soil, rock type

7
GIS Subsystems
Data Input Subsystem
Data Storage and Retriever Subsystem
Data Manipulation and Analysis Subsystem
Data Output Subsystem

GIS/Spatial Analysis Data Types


Spatial Data
Attribute Data

8
Data Values
 Different types of data values may represent attribute of a geographic
phenomena. Data values determines the kind analyses than can be done
on the data
1. Nominal: provides a name and identified to discriminate value. No
true computation can be done on. Sometimes called categorical data
e.g name of a road
2. Ordinal: are numerical data that can be put in sequence, permits no
other type of computation e.g. roads can be classified as highway,
township street road, footpath based on width
3. Interval: numerical values that permits simple computation of
addition and subtraction only. Has no arithmetic zero e.g.
Temperature, dates
4. Ratio: numerical values that permit most type of arithmetic
computation. Has arithmetic zero.
Start X Start Y Road Name Road class Year Length
Built (km)
2526663 4585966 Ilesha-Ife Expressway Highway 2002 800

2456669 4825669 Oba Adesida Road Township road 2000 100


9
Spatial Data Models for Geographic
Phenomenon in a GIS
Three approaches are used for representation
of geographic phenomenon in a GIS
a) Tessellation (Regular and Irregular)
b) Vector (Point, Line and Polygon)
c) Triangular Irregular Network

Raster and Vector are two most common


methods of representing geographic data in GIS

10
Regular Tessellation
Tessellation:
Partitions study space into mutually exclusive cells
Thematic value is assign to each cell which characterize that
part of the space.
Tessellation can be regular or irregular tessellation
Regular tessellation: Cells have the same size and shape

Square cell tessellation commonly called raster is the most


commonly used because georeferencing is easy.
The size of the area covered by a cell of the raster is called
resolution cell.

11
Regular Tessellation: RASTER
 Since the value of a cell applies to
the entire cell area there is need
to solve the problem of
boundaries
For raster cells, the convention is
that the lower and left
boundaries belong to a cell. Lower
left corner carries value

 Solving issue of continuity in continuous


phenomenon.
1. Make the cell size smaller so as to
make the continuity gap between cell
smaller
2. Assume the value assigned to a cell
represents a particular location within
the cell and provides a good function
for all other locations that has
continuity characteristics
 Regular tessellation provide a simple structure that straight forward but
not so adaptive to geographic phenomenon 12
Irregular Tessellation
 Partition cells into mutually exclusive cells
 Cells are of different size and shape which is more adaptive
to spatial phenomenon

13
Irregular Tessellation: Quadtree
 The most common is the quadtree: it partition spaces into 4
quadrants, the portioning stops only when the entire cell is a
quadrant have the same value.
Neighbouring cells with equal values are merged
NW The procedure produces an up-side down
NE
tree-like structure, hence the name
quadtree

SE

SE 14
Vector
 Georeferences are explicitly associated with geographic
phenomenon
 Vertex to represent point, sequential vertices to represent line
segment.
 Each vertex is consist of x,y coordinate in 2D space and x,y,z in 3D
space.
Point: used to represent object than can best be described as 0-
dimensional. Depends purpose of application and the extent of the
object compared to the scale used. Represented with a node/vertex
Line: used for phenomenon that can best be described as 1-
dimensional. Represented with two end nodes and zero or more
internal nodes.
Straight part of a line between two nodes is called line segment.
Area: is represented using arc/node structure that determines an
area’s boundary. The boundary model is used in storage of area in the
case of discreet field. 15
Vector: line and area

An approach to store each polygon with its


own boundary, but this will result in data
redundancy.
A better approach is to use the boundary
model also called topological model.

16
Vector: Topology
 Topological Relationships
The mathematical properties of the geometric space used for
spatial data may be described as follows
1. The space is a 3D Euclidean space in which we can determine
for every point a set of coordinate triplet (x,y,z) of real
numbers. In this space, we can define features such as points,
lines, polygon and volume as geometric primitives of the
respective domain. A point is zero-dimensional, a line is one-
dimensional, polygon is two-dimensional, volume is three-
dimension.
2. The space is a metric space, which means that the distance
between two points can be computed according to a giving
function.
3. The space is a topological space, which means that for every
point in the space we can find a neighbourhood around it that
fully belongs to that space.
4. Interiors and boundaries are properties of spatial features that
remain invariant under topological mapping.

17
Vector: Topology
In the topological space, features
(geographic) are defined as simplices as they
are the simplest geometric shape of some
dimension -point (0-simplex)
-line segment (1-simplex)
-triangle (2-simplex)
-tetrahedron (3-simplex)

18
Vector: Topology
 The following are topological rules in 2D space:
1. Every 1-simplex must be bounded by two 0-simplices
2. Every 1-simplex border 2-simplices
3. Every 2-simplex has a closed boundary consisting of
alternating sequences of 0-simplex and 1-simplex.
4. Around every 0-simplex exist an alternating sequence of 1-
and 2-simplices.
5. 1-simplex only intersect at their bounding nodes.

19
Vector: Topology
 Topology deals with spatial properties that are invariant under specific
transformation
 It is a mathematical approach that enables data to be structured based
on principle of adjacency and connectivity
 Topological relationships are built from simple elements to more
complex elements: nodes define line segments, and line segments
connect to define lines which in turn define polygons
Line Start End Area Area Vertex
node node right left list
b1 4 1 A W ……

b2 1 2 A B ……

b4 2 4 A C …...

b3 1 3 B W …….
b6 3 2 B C …….

b5 3 4 C W ……….

Note: W represents outside polygon or No polygon 20


Triangular Irregular Network (TIN)
 An hybrid between raster and vector representation
 Commonly used for representation of digital terrain model
but can be used for any continuous field
 Built on a set of location that has value (e.g. elevation).
Location can be arbitrarily scattered in space. Locations are
3D (x,y,z).
 From these points triangles can be constructed
 In three dimensional space, three points uniquely determine
a plane so far they are not collinear i.e. they are not on the
same line
 A plane fitted through these points has aspect and gradient
(slope in the case of elevation)
 By restricting the use of a plane to triangular area “between
three anchor points” a triangular tessellation of the
complete study space is obtained
21
Triangular Irregular Network (TIN)
For example, it is possible to construct more than one
type of TIN in the estimation of value of P.
The best triangulation which produces optimal triangles
is referred to as Delaunay triangle.
Rule to generating optimal triangles include
1. the triangles are as equilateral as possible
2. the circumference through the three anchor points should
not contain any other anchor point

50

1550×50 + (980×20)+(1250×20)
22 P= = 1356.67
1550 + 980 + 1250
Triangular Irregular Network (TIN)
 A TIN can also be generated from a raster surface, since a raster
is more or less a grid.

 A TIN is obviously not a vector representation although each


anchor point has a stored georeferencing (x, y), yet we can also
call it an irregular tessellation because the chosen triangulation
provides a portioning of entire study space. Hence it is an hybrid
between vector and tessellation.

24
Triangular Irregular Network (TIN)

25
Spatial Data Models for Geographic
Phenomenon in a GIS
 Both raster and vector can be used to code both fields
and discrete objects
 TIN is mostly used for representation of fields
particularly continuous fields
• In practice there is a strong association between raster
and fields and vector and discrete objects
TIN
Representation of Spatial fields
 A continuous field can be represented by means of a
tessellation, a TIN or a vector. The choice between them is
determined by the requirements of the application in mind.
 It is more common to use tessellation, notably raster for field
representation however vector representation can also be
used
Use of Tessellation (Raster): elevation is usually represented
by raster data (commonly called DEM). Each pixel carry an
elevation value.

Colour is used for display to


enable perception of the
variation of pixel values

27
Representation of Spatial fields
Vector: representation of continuous fields in vector is usually
done using isolines. An isoline is a linear feature that connects
point of equal value.
In the case of elevation, the isoline is called contour.
On traditional topographic map, elevation is usually presented as
contour
Not commonly used but sometimes used in geoinformation
visualization

28
Representation of Spatial fields
Tessellation: In representing discrete field as tessellation,
group of cells is used to define each class of the field

Vector: representation of discrete field in vector is usually done


using polygons. An attribute differentiating each is assigned to all
the classes

29
Six approximate representations of a field used in GIS

Regularly spaced sample points Irregularly spaced sample points Rectangular Cells

Irregularly shaped polygons Triangulated Irregular Network (TIN) Isolines

from Longley, P. A., M. F. Goodchild, D. J. Maguire and D. W. Rind, (2001), Geographic Information
Systems and Science, Wiley, 454 p.
Representation of Geographic Object
Tessellation: remotely sensed data are important source of
data for many GIS application. Unprocessed digital images
contain pixels each carrying a reflectance value.
 Reflectance value can be processed in classified image
and store as raster using one of several available
algorithm.
 Image classification assigns each pixel to one of the finite
list classes thereby obtaining information content of the
image.
 In representing geographic object as raster,
 Line and point appear awkward
 Area objects are conveniently represented although
boundaries may appear jagged. This is a result of raster
resolution versus area-size and artificial boundaries.
Vector:
31
Representation of Geographic Object
Point

Line

Polygon
32
Advantages and Disadvantage of
Tessellation
Advantages
– Due to the nature of the data storage technique, data analysis is usually
easy to program and quick to perform.
– Due to the inherent nature of raster maps (e.g one attribute maps.) it is
ideally suited for mathematical modelling and quantitative analysis.
– Discrete data as well as continuous data are equally accommodated
which facilitates the integration of the two data.
– The geographic location of each cell is implied by its position in the
matrix, other than the origin point with coordinate, no other portion of
a raster carries a coordinate, this leads to less storage requirements.
– Efficient for image processing
• Disadvantages
– The cell size determines the resolution at which the data is represented
– Processing of associated attribute data may be cumbersome if a large
amount of data exists.
– Raster map only reflect only one attribute or characteristic of an area
– Cell boundary are fixed and artificial which is usually representative of
natural phenomenon
– Connectivity analysis is complex to execute in raster data
33
Advantages and disadvantages of
Vector
Advantages
– Data can be represented at its original resolution and form without
generalization
– Graphic output is usually aesthetically pleasing
– Accurate geographic location of features are maintained
– Allows for efficient encoding of topology and as a result more efficient
operations that required topological information
• Disadvantages
– Location of each vertex is stored explicitly resulting in more storage
requirements
– Continuous data such as elevation is not efficiently represented in
vector form
– Spatial analysis and filtering within polygon is impossible
– For effective analysis, vector must be converted in topological
structures which is processing intensive and requires extensive data
cleansing
– Algorithm for manipulative and analysis function such as overlay are
complex and may be processing intensive
34
2 2
4 4

6 2
2 4

35
Spatial Data Analysis

36
Modelling surfaces

• Most common surface analysed is that of the


surface of the earth
• Terrain is almost always represented as having a
single height for any point – what cannot be
represented like this?
• Many different terrain data sets exist – LIDAR,
RADAR
• We want to know how we can compute and use
them in spatial analysis!
Spatial Analysis of Surfaces

Elevation Surface — the ground surface


elevation at each point
Serves as input for computation of other
surface
Most times, computation of surfaces are
done at the scale of a watershed

A watershed is an area of land that is drained by a distinct stream and


river system and it is usually separated from other watersheds by the
crest of hills or mountains. It is also called catchment or drainage
basin
Modelling surfaces
• Slope is formally described by a plane at a
tangent to a point on the surface
• Slope is a vector that has two components:
– Gradient: the maximum rate of change of the
elevation of the plane (the angle that the plane
makes with a horizontal surface).
It is the steepness or degree of inclination of a
surface.
– Aspect: the direction of the plane with respect to
some arbitrary zero (usually north!). direction
– Slope is a vector, gradient define the magnitude of
slope, aspect define direction
• Note that many people/GIS (including me  )
mix-up the terms slope and gradient
• Defined by surface derivative z (dz/dx, dz/dy)
Modelling surfaces: Definitions
• Flow direction: Inferred direction of water flow
given a surface representation
• Flow accumulation: Total flow accumulated
(e.g. the sum) at a given point on a surface –
equivalent to the drainage area upstream of a
given point
• Sink: A local minimum (depression) in a
surface representation
• Catchment: The drainage area associated
with a point, often defined with respect to a
stream junction
Spatial Analysis of Surfaces
Modelling surfaces: Definitions
Curvature is the rate of change of slope and Evans (1980)
suggested two important components, both of which can be
convex, concave or planar
Profile curvature Plan curvature
• Rate of change of gradient • Rate of change of aspect
– Convex: increase in slope gradient (A) – Convex: aspect lines diverge
– Concave: decrease in gradient downslope (A)
downslope (B) – Concave: aspect lines
– Planar: constant gradient (C) converge downslope (B)
– Planar: aspect constant (C)

Combination of profile and plan curvature


Convex Profile + Convex Plan Convex Profile + Concave Plan
Convex Profile + Planar Plan

Concave Profile + Concave Plan


Concave Profile + Convex Plan Concave Profile + Planar Plan

Planar Profile + Concave Plan


Planar Profile + Convex Plan Planar Profile + Planar Plan
https://www.esri.com/arcgis-blog/products/product/imagery/understanding-curvature-rasters/
Spatial Analysis (Watershed Delineation) – Manual
Spatial Analysis (Watershed Delineation) – Automatic

• In GIS we typically ArcGIS Tools


simplify Basin tool – does not require pour point
hydrological Watershed tool - require pour point
systems by
assuming overland
flow (no water
absorbed by soils) and
channelled flow
(flow converges)
• Typically we derive
flow direction, check
for sinks, calculate
flow accumulation
and derive channel Schematic of hydrological tools in
network and ArcGIS
catchments
Why does terrain matter?
• Many physical processes in the environment are
dependent on the properties of terrain
• Topography shapes and is shaped by
hydrological processes
• Catchments are a key unit in understanding
natural processes and anthropogenic influences
on the landscape, e.g.:
– Prediction of stream formation and erosion
– Sediment and nutrient transport in landscape
– Flood forecasting e.g. through snow melt, thunder
storms etc…
Modelling surfaces

• Continuity – implies a value at every point


• Different levels of continuity (can be defined
mathematically)

f(x)

x x x
Step-wise continuous Continuous with abrupt Continuous with
change in slope continuous rate of
change of slope
You should understand what these properties mean in terms of the value and
derivatives of the function f(x) for each of these graphs. Can we calculate
f’(x) everywhere?

Figure after Chrisman


Standard Slope Function

a b c
d e f
g h i dz (a  2d  g) - (c  2f  i)

dx 8 * x_mesh_spacing
dz (g  2h  i) - (a  2b  c)

dy 8 * y_mesh_spacing

2
 dz   dz   rise 
2
rise deg  atan
      
run  dx   dy   run 
Aspect – the steepest downslope
direction

dz
𝑑𝑧/𝑑𝑥
dy Aspect = atan
𝑑𝑧/𝑑𝑦

dz
dx
Example: Moving 3 × 3 window
30
a80 b74 c63
30

80 74 63 78 75 65 d69 e67 f 56

g60 h52 i 50
69 67 56 64 66 57
a b c
60 52 50 60 53 47
d e f

81 72 48 77 74 63 g h i

68 55 45 69 67 56

58 60 52 43 40 35
Example: Slope and Aspect
𝑑𝑧 (a + 2d + g) − (c + 2f + i) 30
=
𝑑𝑥 8 ∗ x_mesh_spacing
a b c

30
=
[80+(2∗69)+60]−[63+(2∗56)+48]
= 𝟎. 𝟐𝟐𝟗 80 74 63
8∗30
145.2o
d e f
𝑑𝑧 (g + 2h + i) − (a + 2b + c)
𝑑𝑦
=
8 ∗ y_mesh_spacing
69 67 56
[60+(2∗52)+48]−[80+(2∗74)+63] g h i
= 8∗30
= −𝟎. 𝟑𝟐𝟗 60 52 48
2 2
𝑟𝑖𝑠𝑒 𝑑𝑧 𝑑𝑧
= +
𝑟𝑢𝑛 𝑑𝑥 𝑑𝑦

𝑟𝑖𝑠𝑒
= (0.229)2 + (−0.329)2 0.229
𝑟𝑢𝑛 Aspect = atan
= 𝟎. 𝟒𝟎𝟏 −0.329
𝑟𝑖𝑠𝑒
Slope = atan = 𝑎𝑡𝑎𝑛 −0.69 = − 34. 8𝑜
𝑟𝑢𝑛

Slope = atan(0.401) = 𝟐𝟏. 𝟖𝒐 Aspect = −34. 8𝑜 + 180o = 145.2o


Flow Direction
• Methods
• D8
• Multiple Flow Directions (MFD)
• D-Infinity (D )
• D8, despite its simplicity, is the standard method in
many GIS
• Flow directions can only have intervals of 45°
• Where steepest drop occurs in multiple
directions, we can either:
– Assign both these directions to cell (by summing
directions – ArcGIS does this)
– Assign 1st direction to cell
– Flag direction as undefined
Hydrologic Slope (Flow Direction)
- Direction of Steepest Descent
30 30
30

80 74 63 80 74 63 4 2 4
30

69 67 30 56 69 67 56 2 4 4
30

60 52 30 48 60 52 48 1 1 4

67  48 67  63
 0.45 67  52  0.50 67  60
 0.16
67  56
 0.36  0.10
30 2 30 30 2 30 30 2

ArcHydro Page 70
Hydrologic Slope (Flow Direction)
- Direction of Steepest Descent

32 64 128

16 1
8 4 2
Modelling channel
networks
• Having derived flow directions, we can derive
channel networks
• Many different approaches exist to modelling
channel networks
– Those based around focal operators e.g. local
maxima/ minima, convexities and concavities
– “Hydrologically-based” algorithms using our flow
directions
– Other approaches derived from areas such as image
processing (you have seen such methods in remote
sensing)
• We will look at the second approach
Calculating Flow Accumulation
• We will use flow direction as a starting point
• Flow accumulation is then calculated as the sum of
upstream elements draining to a cell
– For every cell count how many neighbours drain to it
• For each of these cells do the same
• Repeat until all cells are either on the boundary of the
grid or have no upstream cells
• When we calculate the flow accumulation for a cell we
can also (if we flag the contributing cells) calculate the
catchment boundary for that cell)
• If we choose some pour point in the channel
network, we can calculate the catchment upstream
of that point (which defines the watershed)
Flow accumulation example

Flow direction key

Flow Direction Flow Accumulation

32 16 16 16 16 16 35 13 12 2 1 0
64 32 16 32 16 16 10 9 0 8 4 0
64 64 32 64 64 32 9 4 2 2 1 0
64 32 32 32 32 32 7 0 3 1 1 0
64 32 16 32 32 32 2 3 1 2 0 0
64 16 32 32 32 16 1 0 0 0 1 0

FlowAcc = 2+1+1+2+ 1 Associated watershed


Potential problems with flow
accumulation
• If flow directions converge on a single cell, with
no outflow, then flow accumulation network is
broken
• Such locations are local minima known as sinks
and are generally considered to be undesirable
artefacts of the terrain model
• When might this not be the case?

NB: In ArcGIS, undefined flow directions are also treated as sinks


D8 results on big grid…

59, not 8
unique
values
because
of sinks

Lake
… and resulting flow accumulation
network broken by sinks

Note also problems


in lake
Dealing with sinks
• Two basic strategies to remove sinks:
– 1) Find the nearest adjacent cell with elevation the
same as or lower than the sink – hard code a flow
direction to it (radial search)
– 2) Increase the elevation of the sink to that of a
neighbouring cell – check neighbouring cell then
drains (sink filling) (ArcGIS approach)

Source: ArcMap help


Identifying channels

• Several approaches to identifying channels


– Use of a threshold to identify channels (e.g. cells
with a flow accumulation greater than some specific
value are flagged as streams)
– This threshold may be purely empirical (does the
stream network look realistic) or based on some
knowledge of hydrologic properties (see later)…
– A second grid can be overlain on the flow
accumulation grid to simulate input precipitation
– If we assume all this precipitation is overland flow
then channel initiation can be calculated as some
threshold of accumulated water…
Channel example

Channels identified in a filled DEM with a


simple threshold accumulation area of
250 cells.
Applications of terrain data

Three examples of the use of terrain data:


1. Compound topographic indices (use elements
derived direction from DEM to generate physically
meaningful property)
• Wetness index
• Stream power
Topographic Wetness Index (TWI) also known
as Topographic Flow Index (TFI)
Wetness index
 A  A 
w  ln S
or w ln S

 T tan    tan  
T

Here:
AS is the specific catchment area;
β is the gradient; and
T is the transmissivity of a saturated soil
profile.
If we assume the soil to be homogenous over
the catchment then the second form is used (T=1)
Wetness index
 Create slope in degree using the slope tool -----> slope_deg
 Create flow direction -----> flow_dir
 Covert slope (degree) to radians using raster calculator
[(slope_deg × 3.14159)/180] -----> slope_rad
 Apply tangent function to slope (radians)
[tan(slope_rad)] -----> tan_slope_deg
 Compute the square root of tan_slope_rad
[Squareroot(tan_slope_rad)] ---> sqrt_tan_slope_rad
 Create Flow accumulation
– Use flow direction (flow_dir) as input
– Use sqrt_tan_slope_rad as weighting

73
1 Wetness index
2 5

6
3

4
7

74
Wetness index…
Interpreting wetness index?
• Correlation found between w (wetness index)
and distribution of surface soil water
content in a small fallow catchment
• Indices using the product of plan curvature
(rate of change of aspect) and aspect also
found to give good correlations with surface soil
water
• What physical reasons might there be for
these correlations?
Interpolating geo-
environmental datasets

•Outline
– creating surfaces from points
– interpolation basics
– interpolation methods
– common problems

77
Interpolation
• Definition:
“Spatial interpolation is the procedure of estimating the values
of properties at unsampled sites within an area covered by
existing observations.” (Waters, 1989)

Interpolation is the process of estimating unknown values


that fall between known values.

The unknown value of the cell is based on the


values of the sample points as well as the cell's
relative distance from those sample points.

78
Interpolation
• A surface can be created from a small number
of sample points
• More sample points are better for a detailed
surface
• Sample points should be well distributed
throughout the study area
• Some areas may require a clustering of sample
points (phenomena may be transitioning or
concentrating in that area)
Spatial Autocorrelation
• Principle underlying spatial interpolation is the
First Law of Geography
• Formulated by Waldo Tobler, this law states
that “everything is related to everything else,
but near things are more related than distant
things”
• The formal property that measures the degree
to which near and distant things are related is
spatial autocorrelation

In this graphic, the darkest triangles


indicate the most influential sample
points.
Linear interpolation
Known Values

• Interpolation of cell values


– A best estimate between samples 1 1.125 1.25 1.375 1.5 1.675 1.75 1.875 2
• May consider: 1 Mile
– Distance
– Weight
• Used for:
– Predicting
– Forecasting
– Describing
– Understanding
– Calculating
– Estimating
– Analyzing
– Explaining
list of potential uses:
• Environmental data
– often collected as discrete observations at
points or along transects
– example: soil cores, soil mositure, vegetation
transects, meteorological station data, etc.
• Need to convert discrete data into
continuous surface for use in GIS modelling
– Interpolation provides a means to achieve this

82
list of potential uses:
• list of potential uses:
– wide range of applications
– to provide contours for displaying data graphically
– important in addressing problem of data availability
– quick fix for partial data coverage
– interpolation of point data to surface/polygon data
– role of filling in the gaps between observations
– to calculate some property of a surface at a given point
– to aid in the decision making process both in physical
and human geography and in related disciplines such as
mineral prospecting and resource evaluation

83
Sampling a surface
• Perfect surface
requires infinite
number of
measurements
• Therefore samples
need to be
significant and
random, if possible
• Error increases away
from sample points
Data sampling
• Method of sampling is critical for
subsequent interpolation...

Regular Random Transect

Stratified random Cluster Contour

85
Surfaces from points

Points Surface

86
Sample Size
• Some interpolation methods allow you to control
the number of sample points used to estimate cell
values

• The distance to each sample point will vary


depending on the distribution of the points

• Reducing the size of the sample you use will speed


up the interpolation process because a smaller set
of numbers will be used to estimate each cell value
Controlling sample points
for interpolation
• IDW, Spline & Kriging support control of
sample numbers
• Sample methods:
– Nearest neighbors — you choose how many
– Search radius — variable or max distance
• Returns NoData if samples points is insufficient
Sample Size

 When the sample size is limited to five sample points, as in this


case, only the five nearest points are used in the calculation of
the estimated cell value.All other points are disregarded.

 If the search radius in this sample were fixed, only the values
of the sample points within the radius would be used to
calculate the estimated cell value. If the search radius were
variable and the minimum sample size were 8, the search
radius would expand until it contained eight sample points.
Interpolation Barriers
• The physical, geographic barriers that exist in the
landscape, like cliffs or rivers, present a particular challenge
when trying to model a surface using interpolation; the
values on either side of a barrier that represents a sudden
interruption in the landscape are drastically different

• Most interpolators attempt to smooth over these


differences by incorporating and averaging values on both
sides of the barrier. The Inverse Distance Weighted (IDW)
and kriging methods allow you to include barriers in the
analysis

• The barrier prevents the interpolator from using sample


points on one side of it
Interpolation Barriers

Elevation values change suddenly and radically near the edge of a


cliff. When you interpolate a surface with this type of barrier, you
can't use known values at the bottom of the cliff to accurately
estimate values at the top of the cliff.
When you use a barrier with interpolation, the estimated cell value is
calculated from sample points on one side of the barrier.
Elements of interpolation
• The known points (samples)
– Sample factors - size, Iimits, location ,outliers

• The unknown points (interpolated values)

• Interpolation method - Different methods


will (almost always) produce different
results.
Choosing Interpolation
method…
• How do you choose a method of
interpolation?

• Methods of spatial interpolation:


– many different methods available
– classification according to:
 exact or approximate
 deterministic or stochastic
 local or global
 gradual or abrupt

94
Classification: exact or approximate
• Exact methods:
– honour all data points such that the resulting
surface passes exactly through all data points
– Retain values at known points
– appropriate for use with accurate data
• Approximate methods:
– do not honour all data points
– more appropriate when there is high degree of
uncertainty about data points
– The value of the interpolated surface at the
measured location is different from the measured
value 95
Classification: deterministic
or stochastic
• Deterministic methods:
– create surfaces from measured points, based on either
the extent of similarity (IDW) or degree of smoothing
(Trend).
– allows it to be modelled as a mathematical surface
– used when there is sufficient knowledge about the
surface being modelled
• Geostatistical (Stochastic) methods:
– used to incorporate random variation in the interpolated
surface
– based on statistics (Kriging) with advanced prediction
modeling, includes measure of certainty or accuracy of
predictions. 96
Classification: local or global
• Global methods:
– single mathematical function is applied to all
points
– tends to produces smooth surfaces
• Local methods:
– single mathematical function is applied repeatedly
to subsets of the total observed points
– link regional surfaces into composite surface

98
Classification: gradual or
abrupt
• Gradual methods:
– produce smooth surface between data points
– appropriate for interpolating data of low local
variability
• Abrupt methods:
– produce surfaces with a stepped appearance
– appropriate for interpolating data of high local
variability or data with discontinuities

99
Interpolation methods
• Most GIS packages offer a number of
methods e.g.
– Natural Neighbour Kriging
– Thiessen Polygon
– Inverse Distance Weigted (IDW) technique
– Trend
– Spline
– Triangular Irregular Network

100
Interpolation algorithms in ArcGIS
– Natural Neighbors
– Minimum Curvature Spline
– Spline with Barriers
– Radial Basis Functions
– TopoToRaster
– Local Polynomial
– Global Polynomial
– Diffusion Interpolation with
Barriers
– Kernel Interpolation with Barriers
– Inverse Distance Weighted
– Kriging
– Cokriging
– Moving Window Kriging
– Geostatistical Simulation
Where do I find these capabilities?

• Spatial Analyst – raster, contour

• 3D Analyst – raster, contour, TIN, terrain

• Geostatistical Analyst – raster, contour line, filled


contour polygon, point, geostatistical layer

• ArcGIS Online – filled contour polygon


Areal Interpolation

Use Areal Interpolation

Obesity by school zone Obesity surface and Obesity by census block


error surface

• Statistically robust method for creating surfaces from aggregated polygon data

• And aggregating back to other polygons


Nearest Neighbor

• In Natural Neighbors interpolation, the value of an estimated


location is a weighted average of the values of the natural
neighbors. The weighting is proportional to the area in the
estimation location’s Voronoi polygon that was contributed by each
natural neighbor’s polygon.

• Since the output is a raster, the estimation locations are a regularly


spaced array equal to the number of raster cells.
Thiessen Polygons
• Thiessen (Voronoi) polygons:
– assume values of unsampled locations are equal
to the value of the nearest sampled point
• Vector-based method
– regularly spaced points produces a regular
mesh
– irregularly spaced points produces an network
of irregular polygons

105
Thiessen polygon construction

106
Example Thiessen polygon

Source surface with sample points Thiessen polygons with sample points

107
Example TIN

Source surface with sample points Resulting TIN

110
TIN construction

value b

value c
Interpolated
value x

value a

a c

Plan view Isometric view

111
Spline

• Instead of averaging values, like IDW does, the Spline interpolation method fits a flexible
surface, as if it were stretching a rubber sheet across all the known point values.

• This stretching effect is useful if you want estimated values that are below the minimum
or above the maximum values found in the sample data. This makes the Spline
interpolation method good for estimating lows and highs where they are not included in
the sample data.

• However, when the sample points are close together and have extreme differences in
value, Spline interpolation doesn't work as well. This is because Spline uses slope
calculations (change over distance) to figure out the shape of the flexible rubber sheet.
Spatial moving average
• Vector and raster method:
– most common GIS method
– calculates new value of each location based on
range of values associated with neighbouring
points
– Neighbourhood determined by a filter
 size, shape and character of filter?

117
Spatial moving average (SMA)

118
Example SMA (circular filter)

Source surface with sample points

11x11 circular filter SMA 21x21 circular filter SMA 41x41 circular filter SMA
with sample points
119
Example trend surfaces
Source surface with sample points

Linear Quadratic Cubic

Goodness of fit (R2) = 45.42 % Goodness of fit (R2) = 82.11 % Goodness of fit (R2) = 92.72 %

123
Effects of data uncertainty
Interpolation based on 100 points Error map
Low

Original surface High

Interpolation based on 10 points Error map

125
Edge effects

Original surface with sample points Interpolated surface Error map and extract

Low

High

126
Visual comparisons of Interpolators

IDW Spline

Kriging

Topo to Raster
Natural Neighbor
Nearest Neighbor “Thiessen”
Spline Interpolation
Polygon Interpolation
Common problems
• Input data uncertainty
– Too few data points
– Limited or clustered spatial coverage
– Uncertainty about location and/or value
• Edge effects
– Need data points outside study area
– improve interpolation and avoid distortion at
boundaries

130
Choosing an interpolation method
• You know nothing about your data…
– Use Natural Neighbors. Its is the most conservative, honors the
points. Assumes all highs and lows are sampled, will not create
artifacts.
• Going the next step in complexity…
– Use Kernel Interpolation
• Your surface is not continuous…
– Use Kernel Interpolation or Spline with Barriers if you know there
are faults or other discontinuities in the surface.
• Your input data is contours…
– Use TopoToRaster. It is optimized for contour input. If not creating a
DEM, turn off the drainage enforcement option.
• You want a geostatistical method
– Use Empirical Bayesian Kriging
You want a prediction standard
error map
• Choose from:
• Kernel Interpolation
• Local Polynomial Interpolation
• Kriging

You want an exact interpolator


that honors the points
• Choose from:
• Natural Neighbors
• Spline
• Radial Basis Function
• IDW
The high and low values have not been
sampled… and are important
• Do not use
• Natural Neighbors
• IDW

….....Conclusion……
• Interpolation of environmental point data is
important skill
• Many methods classified by
– local/global, approximate/exact, gradual/abrupt and
deterministic/stochastic
– choice of method is crucial to success
• Error and uncertainty
– poor input data
– poor choice/implementation of interpolation method
Resampling
• Resampling: raster based analysis of converting the pixels
size of a raster data from one resolution to another
– Useful when there is the need to integrate raster data from
different sources with different spatial resolution
– Can be done separately for a data set using the resampling tool
or on the fly during data integration using raster calculator
Resampling techniques
• Nearest Neighbour
• Bi-linear Convolution – 4
nearest pixels
• Cubic Convolution – 16
nearest pixels

• The Bilinear and Cubic


options should not be used
with categorical data, since
the cell values may be
altered
134
Nearest Neighbor Resampling with Cellsize
Maximum of Inputs
100 m

40 50 55
40-0.5*4 = 38

42 47 43
55-0.5*6 = 52

42 44 41
42-0.5*2 = 41
38
52

41-0.5*4 = 39
150 m

4 6
41 39

2 4
100 m
100 m cell size raster calculation

40-0.5*4 = 38
40 50 55 50-0.5*6 = 47
55-0.5*6 = 52
42 47 43
42-0.5*2 = 41
38 47 52
47-0.5*4 = 45
42 44 41 43-0.5*4 = 41
41 45 41
42-0.5*2 = 41
150 m

4 6 6 44-0.5*4 = 42
41 42 39
4 6 41-0.5*4 = 39
2 4 4
2 4 Nearest neighbor values resampled to
2 4 4 100 m grid used in raster calculation
Raster calculation with options set to 100 m
grid
[snow100m] - 0.5 * [temp150m]

• Outputs are on
100 m grid as
38 47 52 desired.
• How were
41 45 41 these values
obtained ?
41 42 39
Reclassification
 Reclassification: raster based analysis commonly used for
conversion of nominal, interval or ratio data values to
ordinal or ratio data values

138
Reclassification

Soil map Reclassified soil map 139


Proximity Analysis (Measurement)
– Buffer; Euclidean Distance
• Proximity analysis such as Buffer (Vector-based)
and Euclidean Distance (Raster-based) compute
distance to specific features used as input in the
analysis
– Buffer: The buffer tool generates polygon around
point, line and polygon features at a specified
distance.
- Input features around
which buffer are computer
can be point line or polygon

140
Proximity Analysis (Measurement)
– Buffer
Buffer can be computed to be Euclidean or
geodesic buffer, flat or round buffers, single or
multiple ring buffers
• Euclidean buffer measures distance in a two-dimensional
Cartesian plane, where distance are calculated between two
points on flat surface
• Geodesic buffer account for the shape of the earth. Distance
are calculated between two points on a curved surface.
– Round buffer have polygons that are rounded at the edges
while flat buffers have

Flat buffer Round buffer

141
Proximity Analysis (Measurement)
– Buffer
– Single buffer generate a single buffer ring (polygon) covering
the specified distance while multiple buffer generate multiple
buffer rings (polygons) covering set of distances specified.
– E.g. To computer buffer covering 200m, 400m, 600m, using
single buffer will result in multiple (3) shapefiles with each
covering one of the distances above. Whereas, multiple
buffer generate a single shapefile containing three features
(polygons)

142
Proximity Analysis (Measurement)
–Euclidean Distance
• Euclidean Distance: computes distance from every cell in the
raster output raster to the nearest source. The source
represent objects of interest, if the source is a raster it must
contain only one value, if it is a vector, it is internally
transformed to a raster during computation.

– If a cell is at an equal distance


to two or more sources, it is
assigned to the source that is
encountered first during the
scanning process.
– You cannot control the
scanning process

143
Spatial Queries
– Measurement and Retrieval
• The GIS Data Storage and Retrieval Subsystem makes it
possible to retrieve data from a large amount of store data.
• The GIS makes of a Database Management System (DBMS),
which allows implementation of Structured Query Language
(SQL).
• Querying can be done in a number of ways
– Tuple Selection: Filters through the entire record and returns records that
meets the condition of the query e.g Select parcel with AreaSize>1000
– Attribute projection: it requires a list of attributes, all of which are attribute
of the input table (schema). The output relation of this operator has as its
schema only the list of attributes specified in the query
– Structuted Query Language (SQL): it is the most common operator for
defining queries in a relational database. SQL differ from the two previous query
language in that it is capable of handling two input relations. (table) using the
Join operator

144
Spatial Queries – Measurement and Retrieval
Tuple Selection Tuple Attribute Projection

SQL using Join using primary key and Foreign Key

145
Spatial Data Integration

146
Spatial Data Integration Methods
 Spatial data integration is mostly achieved using the overlay analysis tools.
Overlay could be done in vector format or raster format depending on the
purpose of the analysis and suitability of the input data.
 Some overlay analysis are better using vector overlay, while other are better using
raster overlay
 Overlay analysis involve integration of two or more data/factor to derive
new information. It is used to model/predict a phenomenon of interest

• Vector Overlay
– Clip
– Dissolve
– Merge
– Union
– Intercept
• Raster Overlay
– Map Algebra
– Weighted Overlay
– Weighted Sum
– Fuzzy Overlay
– Linear Regression (R, Python)
– Logistic Regression (R, Python)
– Weight of Evidence (R, Python, R+ArcGIS)

147
Vector Overlay

148
Vector Overlay

149
Vector Overlay: Example of Intersect
Locality Soil
0°E 5°E 10°E 5°E 10°E 15°E 5°E 10°E
2 6 2 3
Clayey Clayey
Alagbaka FUTA Clayey Loamy FUTA
Oja Ibule 1 5 Ibule
Sands Sandy

1 2 3 4
∩ Sandy
Clay Sandy
3
Clayey
sands Silty
7
FUTA
1 Sandy Clay
Sandy Clay
4
Ibule

Sandy
5
Ipinsha Ilara
Aule Orita
8 Loamy
Obele Ipinsha Ilara Ipinsha
Loamy
Ilara
Loamy Loamy
5 6 7 Sands 7 6
8 4 8
0°E 5°E 10°E 5°E 10°E 15°E 5°E 10°E
ID Local_name ID Soil_name
1 Alagbaka 1 Sandy Clay
2 Oja 2 Clayey
3 FUTA 3 Sandy
4 Ibule
4 Loamy
5 Aule
5 Clayey sands
6 Orita Obele
6 Loamy sands
7 Ipinsha
8 Ilara
7 Silty
8 Silty sands

150
Vector Overlay: Union Example

151
Vector Overlay: Example of Union
Locality Soil
5° 10° 0° 5° 10°
0° 5° 10° 4 5
2
Clayey
Alagbaka Oja FUTA Ibule 1
6
1 2 3 4
∪ Sandy
Clay Sandy
3 1 2 3
9 12

Aule Orita
Obele Ipinsha Ilara Loamy
4 7 8 10 11
5 6 7 8
0° 5° 10° 5° 10° 0° 5° 10°
ID Local_name Soil_name
ID Local_name ID Soil_name 1 Alagbka
1 Alagbaka 1 Sandy Clay 2 Oja
2 Oja 2 Clayey 3 FUTA Sandy Clay
3 FUTA 3 Sandy 4 FUTA Clayey
4 Ibule 4 Loamy 5 Ibule Clayey
5 Aule 6 Ibule Sandy
6 Orita Obele 7 Aule
7 Ipinsha 8 Orita Obele
8 Ilara 9 Ipinsha Loamy
10 Ipinsha Sandy Clay
11 Ilara Sandy
12 Ilara Loamy

152
Vector Overlay: Example of Union
Soil Land use/cover
0°E 5°E 10°E 5°E 10°E 15°E
Clay 3
2 4 5 6 ID Soil_name
ID Soil_type
Loamy 1 Farmland

Grassland
6 1 Sand 1

Bare ground
5
Clay 2 Sandy Loam ∩ 2 Grassland

Grassland
Built-up
Sandy 4 3 Loam 3 Bare ground

Forest
Clayey

Farmland
Loam 3 4 Loamy Clay 4 Built-up
Loam
5 Clayey Loam 5 Grassland
2 Loam 6 Clay 6 Forest
Sand
1

0°E 5°E 10°E 5°E 10°E 15°E

0°E 5°E 10°E 15°E


5 10 11
4

3 9 15 16
6 14
12
2

1 7 8 13
155 0°E 5°E 10°E
Raster Overlay
Raster overlay performs mathematical operations on the corresponding pixel of
input of raster files. Raster overlay can be carried out in a number of ways
1. Using Raster Calculator
2. Boolean Operation
3. Using Weighted Overlay
4. Using Fuzzy Overlay
Raster Overlay

1 +0

0 +0

1+1
Raster Overlay using Calculator
Runoff generation processes
Infiltration excess overland flow P Cell by cell evaluation
aka Horton overland flow of mathematical
P f functions
P qo
f Example
Partial area infiltration excess P 5 6 Precipitation
overland flow 7 6
-
P qo
P
- Losses
f 3 3 (Evaporation,
2 4
P Infiltration)
Saturation excess overland flow
= =
P
P qo 2 3 Runoff
qr 5 2
qs
Raster Overlay

0 0 0
0 0 1
0 0 1

0 1 1
1 1 1
1 0 1
Raster Overlay: Weighted Overlay
Each raster has an external weight to which every pixel value is multiplied before
arithmetic operation with other raster(s) is carried out
Raster Overlay: Weighted Overlay
Weighted overlay is used when the factors to be combined have varying influence on the
phenomenon we are trying to model/predict
Raster Overlay
Network Analysis
• Point-to-point analysis: computes
the most optimal path/route between
a two points. This type of analysis
includes shortest distance, fastest
route, find nearest

• Location Allocation/Finding
Coverage: In this type of network analysis,
drive-time areas correspond to the distance
that can be reached within a specific
amount of time.
– Service Areas – Which houses are within 5, 10,
and 15 minutes from a fire station or a school
164
Network Analysis
• Optimize Fleet: This tool is ideal
when your main goal is to service a set
of orders in the traveling salesperson
problem. Also, you can best minimize the
overall operating cost, by managing sets
of vehicles and drivers. The purpose of
this network analysis tool is to find the
most efficient route for delivery, repair,
transit, or any type of fleet service

• Optimal Site Selection: takes


into account the demand to locate the best
location given several facilities. For example,
it can help decide where to build new
hospitals depending on existing hospitals and
the available demand 165
Network Analysis
• Origin-Destination – OD Cost Matrix:
This type of network uses two sets of
locations to find the distances between all of
the locations in two sets e.g. stores and
warehouses
– In ArcGIS, this is the OD Cost Matrix, which
measures the least cost path from multiple origin
points to multiple destinations
• Trace Analysis:

166
Acknowledgement
• Ross Purves - Working with terrain models
• Steve Kopp & Steve Lynch – Creating Surfaces
• Interpolation tool
• Interpolating environmental datasets: GEOG2590 - GIS for
Physical Geography
• Spatial Analysis Using Grids

167

You might also like