Open Source Spatial Data Analytics in Python with GeostatsPy II
Spatial Uncertainty Modeling with GeostatsPy
Lecture outline . . .
• Spatial Data Declustering
• Interactive Demo with GeostatsPy
• Workflow with GeostatsPy
Michael Pyrcz, The University of Texas at Austin
Motivation
Biased, naïve statistics from biased spatial data samples
result in a biased uncertainty model.
6 Area of Interest
1
7
4 5
y
2
3
Samples
Michael Pyrcz, The University of Texas at Austin
Recorded
Lectures
Michael Pyrcz, The University of Texas at Austin
Open Source Spatial Data Analytics in Python with GeostatsPy II
Spatial Uncertainty Modeling with GeostatsPy
Lecture outline . . .
• Spatial Data Declustering
Michael Pyrcz, The University of Texas at Austin
Spatial Data
Collection
Subsurface data is collected to answer questions:
• how far does the contaminant plume extend? – sample peripheries
• where is the fault? – drill based on seismic interpretation
• what is the highest mineral grade? – sample the best part
• who far does the reservoir extend? – offset drilling
and to maximize value directly:
• maximize production rates
• maximize recovery of a resource
• high grade early value for shorter project pay off period
Michael Pyrcz, The University of Texas at Austin
Clustered
Sample
Let’s make an estimate for an Area / Volume of Interest:
• Inference of the population from a sample.
1 6 Area of Interest
7
4 5
y
2
3
Samples
• To assess the average porosity to calculate OIP
Michael Pyrcz, The University of Texas at Austin
Clustered
Sample
Let’s make an estimate for an Area / Volume of Interest:
1 High 6
7
4 5
y
2
Low 3
• What if we knew from seismic that the reservoir quality
is better in the top left area?
Michael Pyrcz, The University of Texas at Austin
Clustered
Sample
Let’s make an estimate for an Area / Volume of Interest:
8 6
1
9 10
12 7
4 5
11
y
2
3
• What if we kept drilling in the high value region of the
area of interest?
Michael Pyrcz, The University of Texas at Austin
Clustered
Sample
How would our estimate of average porosity change as
we drilled more wells?:
Well Average
Porosity
Sampling Bias
Number of Wells Drilled
• The naïve sample average becomes more biased!
• We need a method to correct for clustered samples.
Michael Pyrcz, The University of Texas at Austin
Some Clustered
Data Exhaustive True Distribution
Here’s data and x-ray vision:
• Location map of 64 wells. with
truth model.
• See the error between the
samples and the underlying truth
model.
Samples and Exhaustive Truth Model Sparse Sample Distribution
Michael Pyrcz, The University of Texas at Austin
Cell
Declustering
Cell Declustering, a method for calculating declustering weights
• divide the volume of interest into a grid of cells 𝑙 = 1, … , 𝐿 count the
occupied cells Lo and the number in each cell 𝑛𝑙 , 𝑙 = 1, … , 𝐿𝑜 , weight
inversely by number in cell (standardize by 𝐿𝑜 )
1 𝑛
Data Weights 𝑤(𝐮𝑗 ) =
𝑛𝑙 𝐿0
1/7 weight x (289 data / 36 cells) = 3.27
𝟏 weight x (289 data / 36 cells) = 1.09
𝟏
weight x (289 data / 36 cells) = 1.63
𝟒
Sum of all weights = n
Nominal / nonclustered weight = 1.0
All data in the same cell get the same weight.
Michael Pyrcz, The University of Texas at Austin
Declustering
Weights
• Declustering weights
1. 1.0 nominal weight
2. < 1.0 reduced weight 1.0
3. > 1.0 increased weight
• Note: some software
programs assume:
𝑛
𝑤(𝐮𝒊 ) = 1
𝑖
1
then ‘nominal weight’ is
𝑛
Michael Pyrcz, The University of Texas at Austin
Declustered
Distribution
• Updated distribution with
declustering weights
• Now data file / table include values
and paired weights based on spatial
arrangement.
• Possible to calculate any weighted
statistic.
– For example, declustered mean:
σ𝑛𝑖 𝑤(𝐮𝑖 )𝑧(𝐮𝑖 )
𝑧ҧ = 𝑛
σ𝑖 𝑤(𝐮𝒊 ) = 𝑛
• Python MatPlotLib hist allows for a
vector of weights.
Michael Pyrcz, The University of Texas at Austin
Cell-based
Declustering Offsets
• The result is sensitive to exact location of the cell mesh
• This sensitivity is removed by iterativing the mesh position,
calculating the weights for each and averaging the result.
Michael Pyrcz, The University of Texas at Austin
Cell Size
Selection
• Plot declustered mean versus the cell size for a range of cell sizes:
• There is no theory that says we are looking for a minimum when the values are
clustered in high values or a maximum when clustered in low values – it just seems to
make sense
• The result can be very sensitive to large scale trends – it is often better to choose
the cell size by visual inspection and some sensitivity studies
• Could choose the cell size so that there is approximately one datum per cell in the
sparsely sampled areas, the nominal spacing
Michael Pyrcz, The University of Texas at Austin
Open Source Spatial Data Analytics in Python with GeostatsPy II
Spatial Uncertainty Modeling with GeostatsPy
Lecture outline . . .
• Interactive Demo with
GeostatsPy
• Explore the impact of
cell size and cell
offsets
Interactive_Declustering.ipynb
Michael Pyrcz, The University of Texas at Austin
Open Source Spatial Data Analytics in Python with GeostatsPy II
Spatial Uncertainty Modeling with GeostatsPy
Lecture outline . . .
• Workflow with GeostatsPy
Michael Pyrcz, The University of Texas at Austin
Spatial Simulation
Workflow with
GeostatsPy
Let’s walkthrough a more
thorough a spatial data
declustering workflow:
• calculate data weights
• visualize and QC the results
Python Jupyter variogram calculation
(GeostatsPy_declustering.ipynb).
Michael Pyrcz, The University of Texas at Austin
Open Source Spatial Data Analytics in Python with GeostatsPy II
Spatial Uncertainty Modeling with GeostatsPy
Lecture outline . . .
• Spatial Simulation
• Interactive Demo with GeostatsPy
• Workflow with GeostatsPy
Michael Pyrcz, The University of Texas at Austin