RSG304 Sdai
RSG304 Sdai
B. J. Fagbohun
2
Course Content
• Spatial Phenomenon – Geographic Object, Geographic field
(discrete and continuous)
• Spatial Data Model – Vector, Tessellation, Triangular Irregular
Network (TIN)
• Manual and Automatic Catchment Delineation (Watershed)
• Surface Analysis – Slope; Aspect; Curvature; Flow Direction, Flow
Accumulation,TWI (Using Map Algebra)
• Interpolation – Kriging; Spline; Trend Surface; Inverse Distance
Weighted (IDW) technique, Theissen Polygon, TIN.
• Classification – Reclassification, Resampling
• Proximity Analysis (Measurement) – Buffer; Euclidean Distance
• Spatial Queries – Measurement and Retrieval
• Vector Overlay – Clipping, Dissolve, Merge, Union, Intersect
• Raster Overlay – Raster Overlay using Map Algebra; Weighted
Overlay; Weighted Sum, Fuzzy Overlay
• Network Analysis – Optimal-path Finding, Network Partitioning,
Network Allocation, Trace Analysis
• Diffusion Computation, Flow Computation
3
Representation of Geographic
Phenomenon
Geographic phenomenon are classified into two
1. Geographic (discrete) object
2. Geographic field: (a) Discrete field
(b) Continuous field
5
Geographic field
Geographic field: has a value at every point
within the study area.
It Can be described by a mathematical function
that assigns a specific value with any position in the
study area.
Geographic field can be discrete or continuous
field.
• Discrete field divide the study space in
mutually exclusive bounded parts with all
locations in one part having same value.
Example of discrete is rock types soil types
• In continuous field the value along a line of f ( y)
f ( x, y)dx
x
7
GIS Subsystems
Data Input Subsystem
Data Storage and Retriever Subsystem
Data Manipulation and Analysis Subsystem
Data Output Subsystem
8
Data Values
Different types of data values may represent attribute of a geographic
phenomena. Data values determines the kind analyses than can be done
on the data
1. Nominal: provides a name and identified to discriminate value. No
true computation can be done on. Sometimes called categorical data
e.g name of a road
2. Ordinal: are numerical data that can be put in sequence, permits no
other type of computation e.g. roads can be classified as highway,
township street road, footpath based on width
3. Interval: numerical values that permits simple computation of
addition and subtraction only. Has no arithmetic zero e.g.
Temperature, dates
4. Ratio: numerical values that permit most type of arithmetic
computation. Has arithmetic zero.
Start X Start Y Road Name Road class Year Length
Built (km)
2526663 4585966 Ilesha-Ife Expressway Highway 2002 800
10
Regular Tessellation
Tessellation:
Partitions study space into mutually exclusive cells
Thematic value is assign to each cell which characterize that
part of the space.
Tessellation can be regular or irregular tessellation
Regular tessellation: Cells have the same size and shape
11
Regular Tessellation: RASTER
Since the value of a cell applies to
the entire cell area there is need
to solve the problem of
boundaries
For raster cells, the convention is
that the lower and left
boundaries belong to a cell. Lower
left corner carries value
13
Irregular Tessellation: Quadtree
The most common is the quadtree: it partition spaces into 4
quadrants, the portioning stops only when the entire cell is a
quadrant have the same value.
Neighbouring cells with equal values are merged
NW The procedure produces an up-side down
NE
tree-like structure, hence the name
quadtree
SE
SE 14
Vector
Georeferences are explicitly associated with geographic
phenomenon
Vertex to represent point, sequential vertices to represent line
segment.
Each vertex is consist of x,y coordinate in 2D space and x,y,z in 3D
space.
Point: used to represent object than can best be described as 0-
dimensional. Depends purpose of application and the extent of the
object compared to the scale used. Represented with a node/vertex
Line: used for phenomenon that can best be described as 1-
dimensional. Represented with two end nodes and zero or more
internal nodes.
Straight part of a line between two nodes is called line segment.
Area: is represented using arc/node structure that determines an
area’s boundary. The boundary model is used in storage of area in the
case of discreet field. 15
Vector: line and area
16
Vector: Topology
Topological Relationships
The mathematical properties of the geometric space used for
spatial data may be described as follows
1. The space is a 3D Euclidean space in which we can determine
for every point a set of coordinate triplet (x,y,z) of real
numbers. In this space, we can define features such as points,
lines, polygon and volume as geometric primitives of the
respective domain. A point is zero-dimensional, a line is one-
dimensional, polygon is two-dimensional, volume is three-
dimension.
2. The space is a metric space, which means that the distance
between two points can be computed according to a giving
function.
3. The space is a topological space, which means that for every
point in the space we can find a neighbourhood around it that
fully belongs to that space.
4. Interiors and boundaries are properties of spatial features that
remain invariant under topological mapping.
17
Vector: Topology
In the topological space, features
(geographic) are defined as simplices as they
are the simplest geometric shape of some
dimension -point (0-simplex)
-line segment (1-simplex)
-triangle (2-simplex)
-tetrahedron (3-simplex)
18
Vector: Topology
The following are topological rules in 2D space:
1. Every 1-simplex must be bounded by two 0-simplices
2. Every 1-simplex border 2-simplices
3. Every 2-simplex has a closed boundary consisting of
alternating sequences of 0-simplex and 1-simplex.
4. Around every 0-simplex exist an alternating sequence of 1-
and 2-simplices.
5. 1-simplex only intersect at their bounding nodes.
19
Vector: Topology
Topology deals with spatial properties that are invariant under specific
transformation
It is a mathematical approach that enables data to be structured based
on principle of adjacency and connectivity
Topological relationships are built from simple elements to more
complex elements: nodes define line segments, and line segments
connect to define lines which in turn define polygons
Line Start End Area Area Vertex
node node right left list
b1 4 1 A W ……
b2 1 2 A B ……
b4 2 4 A C …...
b3 1 3 B W …….
b6 3 2 B C …….
b5 3 4 C W ……….
50
1550×50 + (980×20)+(1250×20)
22 P= = 1356.67
1550 + 980 + 1250
Triangular Irregular Network (TIN)
A TIN can also be generated from a raster surface, since a raster
is more or less a grid.
24
Triangular Irregular Network (TIN)
25
Spatial Data Models for Geographic
Phenomenon in a GIS
Both raster and vector can be used to code both fields
and discrete objects
TIN is mostly used for representation of fields
particularly continuous fields
• In practice there is a strong association between raster
and fields and vector and discrete objects
TIN
Representation of Spatial fields
A continuous field can be represented by means of a
tessellation, a TIN or a vector. The choice between them is
determined by the requirements of the application in mind.
It is more common to use tessellation, notably raster for field
representation however vector representation can also be
used
Use of Tessellation (Raster): elevation is usually represented
by raster data (commonly called DEM). Each pixel carry an
elevation value.
27
Representation of Spatial fields
Vector: representation of continuous fields in vector is usually
done using isolines. An isoline is a linear feature that connects
point of equal value.
In the case of elevation, the isoline is called contour.
On traditional topographic map, elevation is usually presented as
contour
Not commonly used but sometimes used in geoinformation
visualization
28
Representation of Spatial fields
Tessellation: In representing discrete field as tessellation,
group of cells is used to define each class of the field
29
Six approximate representations of a field used in GIS
Regularly spaced sample points Irregularly spaced sample points Rectangular Cells
from Longley, P. A., M. F. Goodchild, D. J. Maguire and D. W. Rind, (2001), Geographic Information
Systems and Science, Wiley, 454 p.
Representation of Geographic Object
Tessellation: remotely sensed data are important source of
data for many GIS application. Unprocessed digital images
contain pixels each carrying a reflectance value.
Reflectance value can be processed in classified image
and store as raster using one of several available
algorithm.
Image classification assigns each pixel to one of the finite
list classes thereby obtaining information content of the
image.
In representing geographic object as raster,
Line and point appear awkward
Area objects are conveniently represented although
boundaries may appear jagged. This is a result of raster
resolution versus area-size and artificial boundaries.
Vector:
31
Representation of Geographic Object
Point
Line
Polygon
32
Advantages and Disadvantage of
Tessellation
Advantages
– Due to the nature of the data storage technique, data analysis is usually
easy to program and quick to perform.
– Due to the inherent nature of raster maps (e.g one attribute maps.) it is
ideally suited for mathematical modelling and quantitative analysis.
– Discrete data as well as continuous data are equally accommodated
which facilitates the integration of the two data.
– The geographic location of each cell is implied by its position in the
matrix, other than the origin point with coordinate, no other portion of
a raster carries a coordinate, this leads to less storage requirements.
– Efficient for image processing
• Disadvantages
– The cell size determines the resolution at which the data is represented
– Processing of associated attribute data may be cumbersome if a large
amount of data exists.
– Raster map only reflect only one attribute or characteristic of an area
– Cell boundary are fixed and artificial which is usually representative of
natural phenomenon
– Connectivity analysis is complex to execute in raster data
33
Advantages and disadvantages of
Vector
Advantages
– Data can be represented at its original resolution and form without
generalization
– Graphic output is usually aesthetically pleasing
– Accurate geographic location of features are maintained
– Allows for efficient encoding of topology and as a result more efficient
operations that required topological information
• Disadvantages
– Location of each vertex is stored explicitly resulting in more storage
requirements
– Continuous data such as elevation is not efficiently represented in
vector form
– Spatial analysis and filtering within polygon is impossible
– For effective analysis, vector must be converted in topological
structures which is processing intensive and requires extensive data
cleansing
– Algorithm for manipulative and analysis function such as overlay are
complex and may be processing intensive
34
2 2
4 4
6 2
2 4
35
Spatial Data Analysis
36
Modelling surfaces
f(x)
x x x
Step-wise continuous Continuous with abrupt Continuous with
change in slope continuous rate of
change of slope
You should understand what these properties mean in terms of the value and
derivatives of the function f(x) for each of these graphs. Can we calculate
f’(x) everywhere?
a b c
d e f
g h i dz (a 2d g) - (c 2f i)
dx 8 * x_mesh_spacing
dz (g 2h i) - (a 2b c)
dy 8 * y_mesh_spacing
2
dz dz rise
2
rise deg atan
run dx dy run
Aspect – the steepest downslope
direction
dz
𝑑𝑧/𝑑𝑥
dy Aspect = atan
𝑑𝑧/𝑑𝑦
dz
dx
Example: Moving 3 × 3 window
30
a80 b74 c63
30
80 74 63 78 75 65 d69 e67 f 56
g60 h52 i 50
69 67 56 64 66 57
a b c
60 52 50 60 53 47
d e f
81 72 48 77 74 63 g h i
68 55 45 69 67 56
58 60 52 43 40 35
Example: Slope and Aspect
𝑑𝑧 (a + 2d + g) − (c + 2f + i) 30
=
𝑑𝑥 8 ∗ x_mesh_spacing
a b c
30
=
[80+(2∗69)+60]−[63+(2∗56)+48]
= 𝟎. 𝟐𝟐𝟗 80 74 63
8∗30
145.2o
d e f
𝑑𝑧 (g + 2h + i) − (a + 2b + c)
𝑑𝑦
=
8 ∗ y_mesh_spacing
69 67 56
[60+(2∗52)+48]−[80+(2∗74)+63] g h i
= 8∗30
= −𝟎. 𝟑𝟐𝟗 60 52 48
2 2
𝑟𝑖𝑠𝑒 𝑑𝑧 𝑑𝑧
= +
𝑟𝑢𝑛 𝑑𝑥 𝑑𝑦
𝑟𝑖𝑠𝑒
= (0.229)2 + (−0.329)2 0.229
𝑟𝑢𝑛 Aspect = atan
= 𝟎. 𝟒𝟎𝟏 −0.329
𝑟𝑖𝑠𝑒
Slope = atan = 𝑎𝑡𝑎𝑛 −0.69 = − 34. 8𝑜
𝑟𝑢𝑛
80 74 63 80 74 63 4 2 4
30
69 67 30 56 69 67 56 2 4 4
30
60 52 30 48 60 52 48 1 1 4
67 48 67 63
0.45 67 52 0.50 67 60
0.16
67 56
0.36 0.10
30 2 30 30 2 30 30 2
ArcHydro Page 70
Hydrologic Slope (Flow Direction)
- Direction of Steepest Descent
32 64 128
16 1
8 4 2
Modelling channel
networks
• Having derived flow directions, we can derive
channel networks
• Many different approaches exist to modelling
channel networks
– Those based around focal operators e.g. local
maxima/ minima, convexities and concavities
– “Hydrologically-based” algorithms using our flow
directions
– Other approaches derived from areas such as image
processing (you have seen such methods in remote
sensing)
• We will look at the second approach
Calculating Flow Accumulation
• We will use flow direction as a starting point
• Flow accumulation is then calculated as the sum of
upstream elements draining to a cell
– For every cell count how many neighbours drain to it
• For each of these cells do the same
• Repeat until all cells are either on the boundary of the
grid or have no upstream cells
• When we calculate the flow accumulation for a cell we
can also (if we flag the contributing cells) calculate the
catchment boundary for that cell)
• If we choose some pour point in the channel
network, we can calculate the catchment upstream
of that point (which defines the watershed)
Flow accumulation example
32 16 16 16 16 16 35 13 12 2 1 0
64 32 16 32 16 16 10 9 0 8 4 0
64 64 32 64 64 32 9 4 2 2 1 0
64 32 32 32 32 32 7 0 3 1 1 0
64 32 16 32 32 32 2 3 1 2 0 0
64 16 32 32 32 16 1 0 0 0 1 0
59, not 8
unique
values
because
of sinks
Lake
… and resulting flow accumulation
network broken by sinks
T tan tan
T
Here:
AS is the specific catchment area;
β is the gradient; and
T is the transmissivity of a saturated soil
profile.
If we assume the soil to be homogenous over
the catchment then the second form is used (T=1)
Wetness index
Create slope in degree using the slope tool -----> slope_deg
Create flow direction -----> flow_dir
Covert slope (degree) to radians using raster calculator
[(slope_deg × 3.14159)/180] -----> slope_rad
Apply tangent function to slope (radians)
[tan(slope_rad)] -----> tan_slope_deg
Compute the square root of tan_slope_rad
[Squareroot(tan_slope_rad)] ---> sqrt_tan_slope_rad
Create Flow accumulation
– Use flow direction (flow_dir) as input
– Use sqrt_tan_slope_rad as weighting
73
1 Wetness index
2 5
6
3
4
7
74
Wetness index…
Interpreting wetness index?
• Correlation found between w (wetness index)
and distribution of surface soil water
content in a small fallow catchment
• Indices using the product of plan curvature
(rate of change of aspect) and aspect also
found to give good correlations with surface soil
water
• What physical reasons might there be for
these correlations?
Interpolating geo-
environmental datasets
•Outline
– creating surfaces from points
– interpolation basics
– interpolation methods
– common problems
77
Interpolation
• Definition:
“Spatial interpolation is the procedure of estimating the values
of properties at unsampled sites within an area covered by
existing observations.” (Waters, 1989)
78
Interpolation
• A surface can be created from a small number
of sample points
• More sample points are better for a detailed
surface
• Sample points should be well distributed
throughout the study area
• Some areas may require a clustering of sample
points (phenomena may be transitioning or
concentrating in that area)
Spatial Autocorrelation
• Principle underlying spatial interpolation is the
First Law of Geography
• Formulated by Waldo Tobler, this law states
that “everything is related to everything else,
but near things are more related than distant
things”
• The formal property that measures the degree
to which near and distant things are related is
spatial autocorrelation
82
list of potential uses:
• list of potential uses:
– wide range of applications
– to provide contours for displaying data graphically
– important in addressing problem of data availability
– quick fix for partial data coverage
– interpolation of point data to surface/polygon data
– role of filling in the gaps between observations
– to calculate some property of a surface at a given point
– to aid in the decision making process both in physical
and human geography and in related disciplines such as
mineral prospecting and resource evaluation
83
Sampling a surface
• Perfect surface
requires infinite
number of
measurements
• Therefore samples
need to be
significant and
random, if possible
• Error increases away
from sample points
Data sampling
• Method of sampling is critical for
subsequent interpolation...
85
Surfaces from points
Points Surface
86
Sample Size
• Some interpolation methods allow you to control
the number of sample points used to estimate cell
values
If the search radius in this sample were fixed, only the values
of the sample points within the radius would be used to
calculate the estimated cell value. If the search radius were
variable and the minimum sample size were 8, the search
radius would expand until it contained eight sample points.
Interpolation Barriers
• The physical, geographic barriers that exist in the
landscape, like cliffs or rivers, present a particular challenge
when trying to model a surface using interpolation; the
values on either side of a barrier that represents a sudden
interruption in the landscape are drastically different
94
Classification: exact or approximate
• Exact methods:
– honour all data points such that the resulting
surface passes exactly through all data points
– Retain values at known points
– appropriate for use with accurate data
• Approximate methods:
– do not honour all data points
– more appropriate when there is high degree of
uncertainty about data points
– The value of the interpolated surface at the
measured location is different from the measured
value 95
Classification: deterministic
or stochastic
• Deterministic methods:
– create surfaces from measured points, based on either
the extent of similarity (IDW) or degree of smoothing
(Trend).
– allows it to be modelled as a mathematical surface
– used when there is sufficient knowledge about the
surface being modelled
• Geostatistical (Stochastic) methods:
– used to incorporate random variation in the interpolated
surface
– based on statistics (Kriging) with advanced prediction
modeling, includes measure of certainty or accuracy of
predictions. 96
Classification: local or global
• Global methods:
– single mathematical function is applied to all
points
– tends to produces smooth surfaces
• Local methods:
– single mathematical function is applied repeatedly
to subsets of the total observed points
– link regional surfaces into composite surface
98
Classification: gradual or
abrupt
• Gradual methods:
– produce smooth surface between data points
– appropriate for interpolating data of low local
variability
• Abrupt methods:
– produce surfaces with a stepped appearance
– appropriate for interpolating data of high local
variability or data with discontinuities
99
Interpolation methods
• Most GIS packages offer a number of
methods e.g.
– Natural Neighbour Kriging
– Thiessen Polygon
– Inverse Distance Weigted (IDW) technique
– Trend
– Spline
– Triangular Irregular Network
100
Interpolation algorithms in ArcGIS
– Natural Neighbors
– Minimum Curvature Spline
– Spline with Barriers
– Radial Basis Functions
– TopoToRaster
– Local Polynomial
– Global Polynomial
– Diffusion Interpolation with
Barriers
– Kernel Interpolation with Barriers
– Inverse Distance Weighted
– Kriging
– Cokriging
– Moving Window Kriging
– Geostatistical Simulation
Where do I find these capabilities?
• Statistically robust method for creating surfaces from aggregated polygon data
105
Thiessen polygon construction
106
Example Thiessen polygon
Source surface with sample points Thiessen polygons with sample points
107
Example TIN
110
TIN construction
value b
value c
Interpolated
value x
value a
a c
111
Spline
• Instead of averaging values, like IDW does, the Spline interpolation method fits a flexible
surface, as if it were stretching a rubber sheet across all the known point values.
• This stretching effect is useful if you want estimated values that are below the minimum
or above the maximum values found in the sample data. This makes the Spline
interpolation method good for estimating lows and highs where they are not included in
the sample data.
• However, when the sample points are close together and have extreme differences in
value, Spline interpolation doesn't work as well. This is because Spline uses slope
calculations (change over distance) to figure out the shape of the flexible rubber sheet.
Spatial moving average
• Vector and raster method:
– most common GIS method
– calculates new value of each location based on
range of values associated with neighbouring
points
– Neighbourhood determined by a filter
size, shape and character of filter?
117
Spatial moving average (SMA)
118
Example SMA (circular filter)
11x11 circular filter SMA 21x21 circular filter SMA 41x41 circular filter SMA
with sample points
119
Example trend surfaces
Source surface with sample points
Goodness of fit (R2) = 45.42 % Goodness of fit (R2) = 82.11 % Goodness of fit (R2) = 92.72 %
123
Effects of data uncertainty
Interpolation based on 100 points Error map
Low
125
Edge effects
Original surface with sample points Interpolated surface Error map and extract
Low
High
126
Visual comparisons of Interpolators
IDW Spline
Kriging
Topo to Raster
Natural Neighbor
Nearest Neighbor “Thiessen”
Spline Interpolation
Polygon Interpolation
Common problems
• Input data uncertainty
– Too few data points
– Limited or clustered spatial coverage
– Uncertainty about location and/or value
• Edge effects
– Need data points outside study area
– improve interpolation and avoid distortion at
boundaries
130
Choosing an interpolation method
• You know nothing about your data…
– Use Natural Neighbors. Its is the most conservative, honors the
points. Assumes all highs and lows are sampled, will not create
artifacts.
• Going the next step in complexity…
– Use Kernel Interpolation
• Your surface is not continuous…
– Use Kernel Interpolation or Spline with Barriers if you know there
are faults or other discontinuities in the surface.
• Your input data is contours…
– Use TopoToRaster. It is optimized for contour input. If not creating a
DEM, turn off the drainage enforcement option.
• You want a geostatistical method
– Use Empirical Bayesian Kriging
You want a prediction standard
error map
• Choose from:
• Kernel Interpolation
• Local Polynomial Interpolation
• Kriging
….....Conclusion……
• Interpolation of environmental point data is
important skill
• Many methods classified by
– local/global, approximate/exact, gradual/abrupt and
deterministic/stochastic
– choice of method is crucial to success
• Error and uncertainty
– poor input data
– poor choice/implementation of interpolation method
Resampling
• Resampling: raster based analysis of converting the pixels
size of a raster data from one resolution to another
– Useful when there is the need to integrate raster data from
different sources with different spatial resolution
– Can be done separately for a data set using the resampling tool
or on the fly during data integration using raster calculator
Resampling techniques
• Nearest Neighbour
• Bi-linear Convolution – 4
nearest pixels
• Cubic Convolution – 16
nearest pixels
40 50 55
40-0.5*4 = 38
42 47 43
55-0.5*6 = 52
42 44 41
42-0.5*2 = 41
38
52
41-0.5*4 = 39
150 m
4 6
41 39
2 4
100 m
100 m cell size raster calculation
40-0.5*4 = 38
40 50 55 50-0.5*6 = 47
55-0.5*6 = 52
42 47 43
42-0.5*2 = 41
38 47 52
47-0.5*4 = 45
42 44 41 43-0.5*4 = 41
41 45 41
42-0.5*2 = 41
150 m
4 6 6 44-0.5*4 = 42
41 42 39
4 6 41-0.5*4 = 39
2 4 4
2 4 Nearest neighbor values resampled to
2 4 4 100 m grid used in raster calculation
Raster calculation with options set to 100 m
grid
[snow100m] - 0.5 * [temp150m]
• Outputs are on
100 m grid as
38 47 52 desired.
• How were
41 45 41 these values
obtained ?
41 42 39
Reclassification
Reclassification: raster based analysis commonly used for
conversion of nominal, interval or ratio data values to
ordinal or ratio data values
138
Reclassification
140
Proximity Analysis (Measurement)
– Buffer
Buffer can be computed to be Euclidean or
geodesic buffer, flat or round buffers, single or
multiple ring buffers
• Euclidean buffer measures distance in a two-dimensional
Cartesian plane, where distance are calculated between two
points on flat surface
• Geodesic buffer account for the shape of the earth. Distance
are calculated between two points on a curved surface.
– Round buffer have polygons that are rounded at the edges
while flat buffers have
141
Proximity Analysis (Measurement)
– Buffer
– Single buffer generate a single buffer ring (polygon) covering
the specified distance while multiple buffer generate multiple
buffer rings (polygons) covering set of distances specified.
– E.g. To computer buffer covering 200m, 400m, 600m, using
single buffer will result in multiple (3) shapefiles with each
covering one of the distances above. Whereas, multiple
buffer generate a single shapefile containing three features
(polygons)
142
Proximity Analysis (Measurement)
–Euclidean Distance
• Euclidean Distance: computes distance from every cell in the
raster output raster to the nearest source. The source
represent objects of interest, if the source is a raster it must
contain only one value, if it is a vector, it is internally
transformed to a raster during computation.
143
Spatial Queries
– Measurement and Retrieval
• The GIS Data Storage and Retrieval Subsystem makes it
possible to retrieve data from a large amount of store data.
• The GIS makes of a Database Management System (DBMS),
which allows implementation of Structured Query Language
(SQL).
• Querying can be done in a number of ways
– Tuple Selection: Filters through the entire record and returns records that
meets the condition of the query e.g Select parcel with AreaSize>1000
– Attribute projection: it requires a list of attributes, all of which are attribute
of the input table (schema). The output relation of this operator has as its
schema only the list of attributes specified in the query
– Structuted Query Language (SQL): it is the most common operator for
defining queries in a relational database. SQL differ from the two previous query
language in that it is capable of handling two input relations. (table) using the
Join operator
144
Spatial Queries – Measurement and Retrieval
Tuple Selection Tuple Attribute Projection
145
Spatial Data Integration
146
Spatial Data Integration Methods
Spatial data integration is mostly achieved using the overlay analysis tools.
Overlay could be done in vector format or raster format depending on the
purpose of the analysis and suitability of the input data.
Some overlay analysis are better using vector overlay, while other are better using
raster overlay
Overlay analysis involve integration of two or more data/factor to derive
new information. It is used to model/predict a phenomenon of interest
• Vector Overlay
– Clip
– Dissolve
– Merge
– Union
– Intercept
• Raster Overlay
– Map Algebra
– Weighted Overlay
– Weighted Sum
– Fuzzy Overlay
– Linear Regression (R, Python)
– Logistic Regression (R, Python)
– Weight of Evidence (R, Python, R+ArcGIS)
147
Vector Overlay
148
Vector Overlay
149
Vector Overlay: Example of Intersect
Locality Soil
0°E 5°E 10°E 5°E 10°E 15°E 5°E 10°E
2 6 2 3
Clayey Clayey
Alagbaka FUTA Clayey Loamy FUTA
Oja Ibule 1 5 Ibule
Sands Sandy
1 2 3 4
∩ Sandy
Clay Sandy
3
Clayey
sands Silty
7
FUTA
1 Sandy Clay
Sandy Clay
4
Ibule
Sandy
5
Ipinsha Ilara
Aule Orita
8 Loamy
Obele Ipinsha Ilara Ipinsha
Loamy
Ilara
Loamy Loamy
5 6 7 Sands 7 6
8 4 8
0°E 5°E 10°E 5°E 10°E 15°E 5°E 10°E
ID Local_name ID Soil_name
1 Alagbaka 1 Sandy Clay
2 Oja 2 Clayey
3 FUTA 3 Sandy
4 Ibule
4 Loamy
5 Aule
5 Clayey sands
6 Orita Obele
6 Loamy sands
7 Ipinsha
8 Ilara
7 Silty
8 Silty sands
150
Vector Overlay: Union Example
151
Vector Overlay: Example of Union
Locality Soil
5° 10° 0° 5° 10°
0° 5° 10° 4 5
2
Clayey
Alagbaka Oja FUTA Ibule 1
6
1 2 3 4
∪ Sandy
Clay Sandy
3 1 2 3
9 12
Aule Orita
Obele Ipinsha Ilara Loamy
4 7 8 10 11
5 6 7 8
0° 5° 10° 5° 10° 0° 5° 10°
ID Local_name Soil_name
ID Local_name ID Soil_name 1 Alagbka
1 Alagbaka 1 Sandy Clay 2 Oja
2 Oja 2 Clayey 3 FUTA Sandy Clay
3 FUTA 3 Sandy 4 FUTA Clayey
4 Ibule 4 Loamy 5 Ibule Clayey
5 Aule 6 Ibule Sandy
6 Orita Obele 7 Aule
7 Ipinsha 8 Orita Obele
8 Ilara 9 Ipinsha Loamy
10 Ipinsha Sandy Clay
11 Ilara Sandy
12 Ilara Loamy
152
Vector Overlay: Example of Union
Soil Land use/cover
0°E 5°E 10°E 5°E 10°E 15°E
Clay 3
2 4 5 6 ID Soil_name
ID Soil_type
Loamy 1 Farmland
Grassland
6 1 Sand 1
Bare ground
5
Clay 2 Sandy Loam ∩ 2 Grassland
Grassland
Built-up
Sandy 4 3 Loam 3 Bare ground
Forest
Clayey
Farmland
Loam 3 4 Loamy Clay 4 Built-up
Loam
5 Clayey Loam 5 Grassland
2 Loam 6 Clay 6 Forest
Sand
1
3 9 15 16
6 14
12
2
1 7 8 13
155 0°E 5°E 10°E
Raster Overlay
Raster overlay performs mathematical operations on the corresponding pixel of
input of raster files. Raster overlay can be carried out in a number of ways
1. Using Raster Calculator
2. Boolean Operation
3. Using Weighted Overlay
4. Using Fuzzy Overlay
Raster Overlay
1 +0
0 +0
1+1
Raster Overlay using Calculator
Runoff generation processes
Infiltration excess overland flow P Cell by cell evaluation
aka Horton overland flow of mathematical
P f functions
P qo
f Example
Partial area infiltration excess P 5 6 Precipitation
overland flow 7 6
-
P qo
P
- Losses
f 3 3 (Evaporation,
2 4
P Infiltration)
Saturation excess overland flow
= =
P
P qo 2 3 Runoff
qr 5 2
qs
Raster Overlay
0 0 0
0 0 1
0 0 1
0 1 1
1 1 1
1 0 1
Raster Overlay: Weighted Overlay
Each raster has an external weight to which every pixel value is multiplied before
arithmetic operation with other raster(s) is carried out
Raster Overlay: Weighted Overlay
Weighted overlay is used when the factors to be combined have varying influence on the
phenomenon we are trying to model/predict
Raster Overlay
Network Analysis
• Point-to-point analysis: computes
the most optimal path/route between
a two points. This type of analysis
includes shortest distance, fastest
route, find nearest
• Location Allocation/Finding
Coverage: In this type of network analysis,
drive-time areas correspond to the distance
that can be reached within a specific
amount of time.
– Service Areas – Which houses are within 5, 10,
and 15 minutes from a fire station or a school
164
Network Analysis
• Optimize Fleet: This tool is ideal
when your main goal is to service a set
of orders in the traveling salesperson
problem. Also, you can best minimize the
overall operating cost, by managing sets
of vehicles and drivers. The purpose of
this network analysis tool is to find the
most efficient route for delivery, repair,
transit, or any type of fleet service
166
Acknowledgement
• Ross Purves - Working with terrain models
• Steve Kopp & Steve Lynch – Creating Surfaces
• Interpolation tool
• Interpolating environmental datasets: GEOG2590 - GIS for
Physical Geography
• Spatial Analysis Using Grids
167