SEEC
SEEC Stats
Stats Toolbox
Toolbox
Species distribution modelling in R
R
Our imperfect knowledge of the
natural world
Our STILL imperfect knowledge
of the natural world
Rahlao et al. (2010). Weed Research 50: 537
What we actually want
Geerts et al. (2016) Biological Invasions. DOI 10.1007/s10530-016-1226-y
What is a species distribution
model (SDM)?
Franklin (2009). Mapping species distributions. Cambridge University Press
Uses of SDMs
●
Conservation prioritization
●
Climate change predictions
●
Rare species detection
●
Invasive species screening
●
Generating hypotheses - correlates of
distribution
●
Habitat suitability
Eight steps to your own SDM
1. Occurrence data
2. Environmental data
3. Background samples
4. Study extent
Southern Bald Ibis
5. Data cleaning Geronticus calvus (Boddaert, 1783)
6. Modelling
7. Checking your model
8. Projecting your model
www.hbw.com
Step 1: Occurrence data
Step 1: Occurrence data
●
Sources:
– Your own data
– gbif.org/species
– newposa.sanbi.org
●
Useful to keep ID, locality, date... for
data cleaning purposes
Step 1: Occurrence data
Step 1: Occurrence data
Step 2: Study extent
Step 2: Study extent
●
Choosing an extent:
– Size of the study area
– Area from which
background samples
are selected
Step 2: Study extent
See Webber et al. (2011). Diversity and Distributions 17: 978
Step 2: Study extent
●
Smaller extent:
– Produces a smaller predicted area and perhaps emphasis on
incorrect predictor variables.
●
Larger extent:
– Produces a larger predicted area, which may be unrealistic,
and climate variables often dominate.
Barbet-Massin et al. 2010. Ecography 33: 878
Step 2: Study extent
●
Depends on your objective: occupied
or suitable habitats (Merow et al.,
2013)
– Occupied habitat – accessible area.
●
Identify where a species is currently found.
●
Identify environmentally limiting factors.
– Suitable habitat – area that you wish to contrast
with against the presences.
●
Climate change.
●
Invasive species.
Step 2: Study extent
v
Step 2: Study extent
●
Useful references:
– Anderson & Raza (2010). Journal of
Biogeography 37: 1378
– Barve et al. (2011) Ecological Modelling 222:
1810
– Merow et al. (2013). Ecography 36: 1058
Step 3: Environmental data
Step 3: Environmental data
●
Sources:
– Bioclim www.worldclim.org/bioclim
●
Can download directly in R using the “getData”
function in the raster package
– BioOracle www.oracle.ugent.be
– Google is your friend
Step 3: Environmental data
●
Considerations:
– Spatial resolution
Pearson & Dawson (2003). Global Ecology & Biogeography 12: 361
Step 3: Environmental data
●
Considerations:
– Temporal resolution
Step 3: Environmental data
●
Considerations:
– Selecting variables
●
Include variables that directly limit a species (e.g.
min. temperature)
●
Include variables that are resources (e.g. nutrients)
●
Include variables that link to physiology (e.g. water
availability)
Step 3: Environmental data
Step 4: Background samples
Step 4: Background samples
●
When true absences unavailable
●
A priori considered equally likely to
contain individuals of a species
Merow et al. (2013). Ecography 36: 1058
What background samples are
used for
Merow et al. (2013). Ecography 36: 1058
Step 4: Background samples
●
Sampling bias
●
Target Group Sampling
– Use coordinates of related
species or in same functional
group to select background
– Accounts for sampling bias
– Create a bias grid for Maxent
●
Useful refs: Hijmans et al. (2000) Conservation Biology 14: 1755
– Elith et al. (2010) Methods in Ecology and Evolution 1: 330
– Merow et al. (2013)
– Phillips et al. (2009) Ecological Applications 19: 181
Step 4: Background samples
Step 4: Background samples
Step 5: Data cleaning
Step 5: Data cleaning
●
Worthy of a lecture on its own
●
Species names
– Synonyms
– Misapplication
– Useful resources:
●
theplantlist.org
●
www.eol.org
●
www.ncbi.nlm.nih.gov
●
www.itis.gov
●
https://ropensci.org/tutorials/taxize_tutorial.html
–
Step 5: Data cleaning
●
Coordinate errors
– Spatial resolution
– Zero lat/lon
– Swapped lat and lon
– How to detect:
●
Points in the sea (terrestrial organisms) or on land
(marine organisms)
●
Country name mismatch
●
Elevational mismatch
●
Outliers with respect to environmental data
Step 5: Data cleaning
●
Pseudoreplication
Step 5: Data cleaning
●
Collinearity
– Predictors that are highly
correlated with one another
(r > 0.8)
– Can be a problem if one wants
to understand environmental
factors that limit species’
distributions (Merow et al., 2013)
– Not a problem if predicting distribution is the sole
aim (Elith et al., 2011. Diversity and
Distributions 17: 43)
Step 5: Data cleaning
●
R package biogeo
Step 6: Running a model
http://www.earthskysea.org/!ecology/sdmShortCourseKState2012/sdmShortCourse_kState.pdf
Step 6: Running a model
●
Maxent (R package dismo)
●
Check out the vignette help
document on SDM from dismo (is in
the script)
●
Useful refs:
– Merow et al. (2013) Ecography 36: 1058
– Yakulic et al. (2012) Methods in Ecology and
Evolution 4: 236
Step 6: Running a model
●
Some basic things to consider:
http://www.earthskysea.org/!ecology/sdmShortCourseKState2012/sdmShortCourse_kState.pdf
Step 6: Running a model
●
Some basic things to consider:
– Regularization coefficient (β) – penalizes
overfitting, but it is user-specified.
– Try a range of β values and evaluate model fit
(e.g. using AUC) (Merow et al., 2013)
Step 6: Running a model
http://www.earthskysea.org/!ecology/sdmShortCourseKState2012/sdmShortCourse_kState.pdf
Step 7: Checking your model
Step 7: Checking your model
●
Model accuracy
– Area under the receiver operating characteristic
curve (AUC)
– Other measures too and suggested that you use
some of these (Sensitivity, specificity, Boyce
Index…)
– Usually check by bootstrapping:
●
e.g. 100 model runs
●
70% of data used to build a model (training data)
●
30% used to evaluate (test data)
Step 7: Checking your model
http://www.earthskysea.org/!ecology/sdmShortCourseKState2012/sdmShortCourse_kState.pdf
Step 7: Checking your model
●
Response curves
– Are they biologically realistic?
– Is there enough sampling across the full range
of the predictor?
Step 8: Projecting your model
Step 8: Projecting your model
●
Very easy to make a map, but is it a
good map?
●
Model extrapolation
– novel environmental
space
Zurell et al., 2012. Diversity and Distributions 18: 628
Step 8: Projecting your model
●
Novel environmental space refs:
– Multivariate environmental similarity surface
(MESS) (Elith et al., 2010. Methods in Ecology
& Evolution 1: 330)
– Environmental overlap masks (Zurell et al.,
2012. Diversity and Distributions 18: 628)
THE END
Other useful references
●
Elith & Leathwick (2009). Annual Rev. Ecol.
Evol. Syst. 40: 677
●
Elith et al. (2011). Diversity and Distributions
17: 43
●
Liu et al. (2013) Journal of Biogeography
40:778
●
Renner et al. (2015). Methods in Ecology and
Evolution 6: 366
●
And many more… (so carry on reading!)