Releases: tidymodels/spatialsample
spatialsample 0.6.0
- Fixed bug where passing a polygon to
spatial_nndm_cv()forced leave-one-out
CV, rather than the intended sampling of prediction points from the polygon.
spatialsample 0.5.1
-
spatial_block_cv()now adds anexpand_bboxattribute to the resulting rset
for compatibility withrsample::reshuffle_rset() -
autoplot.spatial_block_cv()now plots the proper grid (using the new
expand_bboxattribute).
spatialsample 0.5.0
-
spatial_block_cv()gains an argument,expand_bbox, which represents the
proportion a bounding box should be expanded by (each corner of the bounding
box is expanded bybbox_corner_value * expand_bbox).- This is a breaking change for data in planar coordinate reference
systems. Set to 0 to obtain previous behaviors. - Data in geographic coordinates was already having its bounding box expanded
by the default 0.00001. - This makes it so that regularly spaced data is less likely to fall precisely
along grid lines (and therefore fall into two assessment sets) and so that
geographic data falls is more likely to fall within the constructed grid. - Thanks to Nikos on StackOverflow for reporting this behavior:
https://stackoverflow.com/q/77374348/9625040
- This is a breaking change for data in planar coordinate reference
-
spatial_block_cv()will now throw an error if observations are in multiple
assessment folds (caused by observations, or observation centroids, falling
precisely along grid polygon boundaries). -
In
spatial_nndm_cv(), passing a single polygon (or multipolygon) to the
prediction_sitesargument will result in prediction sites being sampled from
that polygon, rather than from its bounding box. -
get_rsplit()is now re-exported from the rsample package. This provides a
more natural, pipe-able interface for accessing individual splits;
get_rsplit(rset, 1)is identical torset$splits[[1]].
spatialsample 0.4.0
spatial_nndm_cv()is a new function for nearest neighbor distance matching
cross-validation, as described in Milà et al. 2022
(doi: 10.1111/2041-210X.13851). NNDM was first implemented in CAST
(https://cran.r-project.org/package=CAST).
spatialsample 0.3.0
Breaking changes
-
spatial_clustering_cv()no longer accepts non-sf objects. Use
rsample::clustering_cv()for these instead (#126). -
spatial_clustering_cv()now uses edge-to-edge distances, like the rest of
the package, rather than centroids (#126).
New features
-
All functions now have a
repeatsargument, defaulting to 1, allowing for
repeated cross-validation (#122, #125, #126). -
spatial_clustering_cv()now has adistance_functionargument, set by
default toas.dist(sf::st_distance(x))(#126).
Minor improvements and fixes
-
Outputs from
spatial_buffer_vfold_cv()should now have the correctradiusandbufferattributes (#110). -
spatial_buffer_vfold_cv()now has the correctidvalues when using repeats (#116). -
spatial_buffer_vfold_cv()now throws an error whenrepeats > 1 && v >= nrow(data)(#116). -
The minimum
sfversion required is now>= 1.0-9, so that unit objects can be passed tocellsizeinspatial_block_cv()(#113; #124). -
autoplot()now handles repeated cross-validation properly (#123).
spatialsample 0.2.1
-
Mike Mahoney is taking over as package maintainer, as Julia Silge (who remains
a package author) moves to focus on ModelOps work. -
Functions will now return rsplits without
out_id, like most rsample
functions, wheneverbufferisNULL. -
spatial_block_cv(),spatial_buffer_vfold_cv(), and buffering now support
using sf or sfc objects with a missing CRS. The assumption is that data in an
NA CRS is projected, with all distance values in the same unit as the
projection. Trying to use alternative units will fail. Set a CRS if these
assumptions aren't correct. -
spatial_buffer_vfold_cv()and buffering no longer support tibble or
data.frame inputs (they now require sf or sfc objects). It was not easy to
use these to begin with, but should have always caused an error: use
rsample::vfold_cv()instead or transform your data into an sf object. -
spatial_buffer_vfold_cv()has had some attribute changes to matchrsample:strataattribute is now the name of the column used for stratification,
or not set if there was no stratification.poolandbreakshave been added as attributesradiusandbufferare now set to 0 if they were passed asNULL.
spatialsample 0.2.0
New features
-
spatial_buffer_vfold_cv()is a new function which wraps
rsample::vfold_cv(), allowing users to add inclusion radii and exclusion
buffers to their vfold resamples. This is the supported way to perform
spatially buffered leave-one-out cross validation (setvtonrow(data)). -
spatial_leave_location_out_cv()is a new function with wraps
rsample::group_vfold_cv(), allowing users to add inclusion radii and
exclusion buffers to their vfold resamples. -
spatial_block_cv()is a new function for performing spatial block
cross-validation. It currently supports randomly assigning blocks to folds. -
spatial_clustering_cv()gains an argument,cluster_function, which
specifies what type of clustering to perform.cluster_function = "kmeans",
the default, usesstats::kmeans()for k-means clustering, while
cluster_function = "hclust"usesstats::hclust()for hierarchical
clustering. Users can also provide their own clustering function. -
spatial_clustering_cv()now supportssfobjects! Coordinates are inferred
automatically when usingsfobjects, and anything passed tocoordswill
be ignored with a warning. Clusters made usingsfobjects will take
coordinate reference systems into account (usingsf::st_distance()),
unlike those made using data frames. -
All resampling functions now support spatial buffering using two arguments.
radiuslets you specify an inclusion radius for your test set, where any
data withinradiusof the original assessment set will be added to the
assessment set.bufferspecifies an exclusion buffer around the test set,
where any data withinbufferof the assessment set (afterradiusis
applied) will be excluded from both sets. -
autoplot()now has a method for spatial resamples built fromsfobjects.
It works both onrsetobjects and onrsplitobjects, and has a special
method for outputs fromspatial_block_cv(). -
boston_canopyis a new dataset with data on tree canopy change over time in
Boston, Massachusetts, USA. It uses a projected coordinate reference system
and US customary units; see?boston_canopyfor instructions on how to
install these into your PROJ installation if needed.
Documentation
-
The "Getting Started" vignette has been revised to demonstrate the new
features and clustering methods. -
A new vignette has been added walking through the spatial buffering process.
Dependency changes
-
R versions before 3.4 are no longer supported.
-
glue,sf, andunitshave been added to Imports. -
ggplot2has been moved to Imports. It had been in Suggests. -
covr,gifski,lwgeom, andvdiffrare now in Suggests. -
rlangnow has a minimum version of 1.0.0 (was previously unversioned).
spatialsample 0.1.0
- First release of spatialsample