Dixon 2014
Dixon 2014
By Philip M. Dixon
Keywords: spatial point process, spatial point pattern, clustering, spatial scale, spatial segregation, edge
effect corrections
Abstract: Ripley’s K function summarizes spatial point process data. It can be used to
describe a set of locations, test hypotheses about patterns, and estimate parameters in a
spatial point process model. For a stationary point process, K(t) is the expected number of
additional points within distance t of a focal point divided by the intensity of the process. A
univariate version is used for one set of locations and a multivariate version is used when
points can be labeled by a small number of groups. This article reviews the properties
of Ripley’s K function and two related functions, then illustrates the computation and
interpretation using data on the locations of trees in a swamp hardwood forest.
Ripley’s K(t) function is a tool for analyzing spatial point process data (see Point Processes, Spatial), that is,
data on the locations of events. These are usually recorded in two dimensions, but they may be locations
along a line or in space. Here, K(t) is described for two-dimensional spatial data. Ripley’s K(t) function
can be used to summarize a point pattern, test hypotheses about the pattern, estimate parameters, and
fit models. Bivariate or multivariate generalizations can be used to describe relationships between two or
more point patterns. Applications include spatial patterns of trees [1–3] , herbaceous plants [4] , bird nests
[5]
, cells [6, 7] , and disease cases [8] . Details of various theoretical aspects of K(t) are in books by Cressie [9] ,
Diggle [10] , Illian et al. [11] , Möller and Waagepetersen [12] , and Ripley [13] . Examples of computation and
interpretation can be found in those books, Baddeley et al. [14] , and Bivand et al. [15] .
The K function is
K (t) = 8−1 E [number of extra eventswithin
(1)
distance t of a randomly chosen event]
Based in part on the article “Ripley’s K function” by Philip M. Dixon, which appeared in the Encyclopedia of Environmetrics.
This article was originally published online in 2013 in Encyclopedia of Environmetrics, c John Wiley & Sons, Ltd and republished
in Wiley StatsRef: Statistics Reference Online, 2014.
Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved. 1
Ripley’s K Function
where 8 is the intensity (number per unit area) of events [16, 17] .
K(t) describes characteristics of the point processes at many distance scales. Alternative summaries
(e.g., mean nearest-neighbor distance or the cumulative distribution function (cdf) of distance from
random points to their nearest neighbors; see Nearest Neighbor Methods) do not have this property.
Many ecological point patterns show a combination of effects, for example, clustering at large scales and
regularity at small scales. The combination can be seen as a characteristic pattern in a plot of the K(t)
function.
K(t) does not uniquely define a point process in the sense that two different processes can have the
same K(t) function [18, 19] . Also, while K(t) is related to the nearest-neighbor distribution function [20, p.
158], the two functions describe different aspects of a point process. In particular, processes with the
same K(t) function may have different nearest-neighbor distribution functions, G(t), and vice versa. K(t)
is also closely related to the pair correlation function, g(t) [21, p. 218]. Stoyan and Penttinen [3] summarize
the relationships between K(t) and other statistics for spatial point processes.
Although it is usual to assume stationarity, K(t) can be defined and estimated for some nonstationary
processes [22] . It is also customary to assume isotropy, that is, one unit of distance in the Y direction has
the same effect as one unit of distance in the X direction (see Spatial Analysis in Ecology). If the degree of
anisotropy is known, then the definition of the distance t can be adjusted [23, p. 64].
For many point processes, the expectation in the numerator of the K(t) function in (1) can be analytically
evaluated, so that the K(t) function can be written in closed form. The simplest, and most commonly
used, is K(t) for a homogeneous Poisson process, also known as complete spatial randomness (CSR):
K (t) = Bt 2 (2)
A variety of processes can be used to model small-scale regularity. Hard-core processes are those in which
events cannot occur within some minimum distance of each other. In what is known as a Matern hard-
core process [16, pp. 47–48], locations are a nonrandom thinning of a homogeneous Poisson process with
intensity D. Any pair of events separated by less than a critical distance * are deleted. The remaining events
are a realization of a hard-core process. Soft-core processes are those where the number of neighbors
within some critical distance * is smaller than expected under CSR, but the number is not zero. One
example is a Strauss process [24] , in which a fraction, 1 − (, of the events within the critical distance, *, is
deleted. An approximation to the K(t) function for this process is [25]
(Bt 2 , 0 < t ≤ *
(
K (t) = (3)
Bt 2 − (1 − ()B*2 , t ≥ *
Events may also be spatially clustered. One process that generates clustered locations is a Neyman–Scott
process in two dimensions. “Parent” events are a realization of a homogeneous Poisson process with
intensity D. Each parent event, i, generates a random number of “offspring” events Ni , where Ni has a
Poisson distribution with mean m. The locations of the offspring, relative to the parent individual, have a
bivariate normal (Gaussian) distribution with zero means and variance F 2 I. When locations of the parent
events are ignored, locations of the clustered offspring events are a realization of a Neyman–Scott process
[21]
. The K(t) function for this process is [10]
A general Poisson cluster process has arbitrary distributions for the number of offspring per parent N,
and the distance between offspring from the same parent, F(t). K(t) for this process is
E [N (N − 1)]F (t)
K (t) = Bt 2 + (5)
D: 2
where : is the mean number of offspring per parent. K(t) functions can also be written for other clustered
and regular processes; see Refs [23, pp. 650–695], [26, pp. 63–85], or [21, pp. 371–407] for details.
Estimating K(T)
Given the locations of all events within a defined study area, how can K(t) be estimated? K(t) is a ratio of
∧
a numerator and the intensity of events, 8. The intensity can be estimated as 8= N/A, where N is the
observed number of points and A is the area of the study region. It is customary to condition on N, so the
∧
[27]
uncertainty in 8 can be ignored although unconditionally unbiased estimators
P P have been suggested .
If edge effects are ignored, then the numerator can be estimated by N−1 i j6=i I(dij < t), where dij is the
distance between the ith and jth points and I(x) is the indicator function with the value 1 if x is true and
0 otherwise. However, the boundaries of the study area are usually arbitrary. Edge effects arise because
points outside the boundary are not counted in the numerator, even if they are within distance t of a point
∧
in the study area. Ignoring edge effects biases the estimator K (t), especially at large values of t.
A variety of edge-corrected estimators have been proposed. The most commonly used is due to Ripley
[16]
:
∧ ∧ −1 XX I (di j < t)
K (t) = 8 w (l i , l j )−1 (6)
i j 6=i
N
As above, dij is the distance between the ith and jth points and I(x) is the indicator function. The weight
function, w(li ,lj ), provides the edge correction. It has the value of 1 when the circle centered at li and
passing through the point lj (i.e., with a radius of dij ) is completely inside the study area. If part of the
circle falls outside the study area (i.e., if dij is larger than the distance from li to at least one boundary), then
w(li ,lj ) is the proportion of the circumference of that circle that falls in the study area. The effects of edge
corrections are more important for large t because large circles are more likely to be outside the study
area. Other edge-corrected estimators and their properties are summarized in Refs [23, pp. 616–618] and
∧
[21, pp. 180–189]. Although K (t) can be determined for any t, it is a common practice to consider only t
less than one-half the shortest dimension of the study area, if the study area is approximately rectangular,
or t < (A/2)1/2 , where A is the area of the study region.
If the spatial process is nonstationary, but only because the intensity, 8, varies over the study area,
Baddeley’s inhomogeneous estimator [22] is:
∧ XX I (di j < t)
K I (t) = |A|−1 w (l i , l j )−1 ∧ ∧
(7)
i j 6=i 8 (l i ) 8 (l j )
One practical difficulty is the estimation of the intensity. This can be done nonparametrically if the scale
∧
of variation in intensity is sufficiently larger than the scale of clustering or repulsion. K (t) is easy to
compute, except perhaps for the geometric aspects of the edge corrections. Edge-corrected estimators are
available in at least three R packages, splancs, spatial, and spatstat. Splancs and especially spatstat provide
a wide variety of modeling and simulation functions.
Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved. 3
Ripley’s K Function
The simplest use of Ripley’s K(t) function is to test whether the observed events are consistent with a
specified spatial point process. Often, the first step in the analysis of point-pattern data is to test the null
hypothesis of CSR. If so, then K(t) = Bt2 for all t. In practice, it is easier to use L(t) = [K(t)/B]1/2 and its
estimator:
∧ ∧ 1/2
L (t) = [ K (t)/B] (8)
∧
because var [ L (t)] is approximately constant under CSR [28] . Under CSR, L(t) = t. Deviations from the
expected value at each distance, t, are used to construct tests of CSR. One approach is to test L(t) − t = 0
at each distance, t. Another is to combine information from a set of distances into a single test statistic,
∧ ∧ ∧ P ∧
such as L m = sup| L (t) − t| or L s = t | L (t) − t|. Critical values can be computed by Monte Carlo
t
[29]
∧ √
simulation
√ or approximated. For L m , approximate 5% and 1% critical values are 1.42 A/N and
1.68 A/N [28] .
More complicated spatial processes—e.g., a Neyman–Scott process (4) or a Strauss process (3)—can be
tested by similar comparisons if all the parameters of that process are known. Usually, parameter values
for more complicated spatial processes are not known a priori and must be estimated. One reasonable
∧
approach is to find the values 2 that minimize a discrepancy measure between the observed K (t) and
Rt ∧ 2
the theoretical K(t,2). Diggle [10] suggests 0 0 [ K (t)0.25 − K (t)0.25 ] dt, which can be approximated by
X ∧ 2
D (2) = [ K (t)0.25 − K (t, 2)0.25 ] (9)
t
where the sum is over a subset of values of t between 0 and t0 . The exponent of 0.25 is chosen empirically
to give reasonable results for a variety of random and aggregated patterns [26, p. 87]. The upper limit, t0 , is
chosen to span the biologically important spatial scales. Large values of t0 relative to the size of the study
∧
area should be avoided because of the large uncertainty in K (t) for large t. This model-fitting approach
can be extended to fit processes for which K(t) cannot be written in closed form, so long as the process
can be simulated [23] .
Diagnostics for fitted models include estimation of residuals [23, pp. 656–657] or comparison with
simulated data (see Regression Diagnostics). Given estimates of the parameters and an algorithm to
∧
simulate data from a particular spatial process, K (t) can be determined for a set of simulated realizations.
∧ ∧
If the fitted model is reasonable, then the observed K (t) function should be similar to the K (t) from
the simulated data. Two-sided 95% pointwise confidence bounds for the fitted model can be estimated
∧
by simulating 199, 499, or more realizations of the spatial pattern, then computing quantiles of K (t)
for each t. This approach tends to overstate the confidence in the fit, since the fit is evaluated using the
same data and loss function that were used to estimate the parameters. This can be avoided by using K(t)
functions to estimate parameters and nearest neighbor-methods to evaluate the fit [26, p. 89–90].
for example, inhibition at short distances and clustering at larger distances, it is more informative to look
at the spatial pattern at a specific distance. This is done using a pair correlation function [21, p. 218–221].
The pair correlation function, g(t), is defined as
1 dK (t)
g (t) = (10)
2 Bt dt
For a homogeneous Poisson process, g(t) = 1 for all distances. When the process exhibits clustering at a
particular distance, g(t) > 1 at that distance and when the process exhibits inhibition, g(t) < 1.
Estimating the pair correlation function is more difficult than estimating K(t), for the same reason
that estimating a probability density function is more difficult than estimating a cumulative distribution
function. Although a histogram-like estimator has been proposed (the O-ring estimator [30] ), it is better to
use kernel smoothing.
Because the behavior of the pair correlation function is more closely tied to the spatial scale of a
process, it is easier to interpret [21, p. 218], which argues strongly for more widespread use of g(t) as the
primary summary statistic for a spatial pattern. If the interest is primarily in a test of a specified spatial
process, for example, CSR, it is not clear which statistic, g(t) or K(t), provides the more powerful test. The
answer may depend on the details of the deviation from the specified process.
The previous analyses considered only the location of an event; they ignored any other information about
that event. Many point patterns include biologically interesting information about each point, for example,
species identifiers (if the points include more than one type of species), whether the individual survived
or died (for spatial patterns of trees or other plants), and whether a location is a disease case or a randomly
selected control. Such data are examples of multivariate spatial point patterns, which are forms of marked
point patterns that have a small number of discrete marks. In the previous examples, the marks are the
species identifier, the fate (live or dead) or the disease status (case or control), respectively. The univariate
methods in the previous section can be used to analyze or model the spatial pattern of all individuals
(ignoring the marks) or the separate patterns in each type of mark. Many biological questions concern
the relationships between marks, however, for which the multivariate methods described in this section
are needed.
The generalization of K(t) to more than one type of point (a multivariate spatial point process) is
K i j (t) = 8−1
j E [number of type j events
within distance t of a randomly (11)
chosen type i event]
When there are g types of events, there are g2 K functions, K11 (t),K12 (t),. . .,K1g (t),K21 (t),. . ., K2g (t), . . .,Kgg (t).
It is helpful to distinguish the cross-K functions Kij (t), where i 6= j, from the self-K functions, Kii (t).
Analytical expressions for Kij (t) are known for various multivariate point processes; see Ref. [23, pp. 699–
707] or [26, pp. 82–85]. Estimators of each bivariate Kij (t) function are similar to estimators of univariate
∧ ∧ ∧ −1 P P
K(t) functions. If edge corrections are not needed, then K i j (t) = (8i 8 j A) k l I (di k , jl < t), where
dik , jl is the distance between the kth location of type i and the lth location of type j and A is the area of
the study region. Various edge corrections have been suggested; one common example is the extension
Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved. 5
Ripley’s K Function
where w(ik ,jl ) is the fraction of the circumference of a circle centered at the kth location of process i with
radius dik , jl that lies inside the study area.
If the spatial process is stationary, then corresponding pairs of cross-K functions are equal, that is,
∧ ∧
K12 (t) = K21 (t) and Kij (t) = Kji (t). When edge corrections are used, K i j (t) and K j i (t) are positively
∧∗ ∧ ∧
correlated but not equal. This suggests the use of a more efficient estimator, K i j (t) = [8 j K i j (t) +
∧ ∧ ∧ ∧ ∧ ∧
8i K j i (t)]/(8i + 8 j ) [32] , although other linear combinations of K i j (t) and K j i (t) may have even smaller
variance.
Questions about the relationship between two spatial processes can be asked in two different ways. The
independence approach [32] conditions on the marginal structure of each process and asks questions about
the interaction between the two processes. The random labeling approach [33] conditions on the observed
locations and asks questions about the process that assigns labels to points. The distinction between
independence and random labeling of two spatial processes requires some care and consideration. When
there is no relationship between the two processes, the two approaches lead to different expected values
of the cross-K function, K12 (t), and to different nonparametric test procedures.
Under independence, the cross-type K function is K12 (t) = Bt2 , regardless of the individual univariate
∧∗ ∧∗ 1/2
spatial patterns of the two types of events. It is easier to work with the corresponding L i j (t) = [ K i j (t)/B]
∧∗
function, because the variance of L 12 (t) is approximately constant. Under independence, L ∗12 (t) = t.
∧
Values of L 12 (t) − t > 0 indicate association between the two processes at distance t; values less than 0
∧
indicate repulsion. As with the univariate functions, tests can be based on the distribution of L 12 (t) (or
∧∗ ∧
K 12 (t)) at each distance t, or on summary statistics such as max0<t≤t0 | L 12 (t) − t|. Determining critical
values for a test of independence is more difficult than in the univariate setting since inferences are
conditional on the marginal structure of each type of event [32] . This requires maintaining the univariate
spatial pattern of each process, but breaking any dependence between them. If both univariate spatial
patterns can be described by parametric models, then it is easy to estimate the critical values by simulating
independent realizations of each parametric spatial process. A nonparametric alternative is to use pattern
reconstruction [34] to simulate independent realizations of each process without specifying parametric
models.
The method of toroidal shifts provides a second nonparametric way to test independence when the
study area is rectangular. All the locations for one type of event are displaced by a randomly chosen
displacement (1X, 1Y). The study area is treated as a torus, so the upper and lower edges are connected
∧∗
and the right and left edges are connected. K 12 (t) and the desired test statistics are computed from the
randomly shifted data. Random displacement and estimation of the test statistic(s) are repeated a large
number of times to estimate critical values for the test statistic(s). In practice, the toroidal shift method
appears to be sensitive to the assumption that the multivariate spatial process is stationary.
Under random labeling, K12 (t) = K21 (t) = K11 (t)=K22 (t) = K(t), that is, all four bivariate K(t) functions
equal the K function for all events, ignoring their labels (since each type of event is a random thinning
of all events). Departure from random labeling can be examined using pairwise differences between
∧ ∧
K functions. Each pairwise difference evaluates different biological effects. K 11 (t) − K 22 (t) evaluates
whether one type of event is more (or less) clustered than the other. Diggle and Chetwynd [8] use this
∗ ∗
to examine disease clustering. K 11 (t) − K 12 (t) and K 22 (t) − K 12 (t) evaluate whether one type of event
6 Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved.
Ripley’s K Function
tends to be surrounded by other events of the same type. Gaines et al. [26] use this to examine spatial
segregation of waterbird foraging sites.
Inference is based on either a Monte Carlo simulation or a normal approximation. The appropriate
simulation approach fixes the combined set of locations and the number of each type of event, then
randomly assigns labels to locations. In general, the variance of any of the three differences increases
∧
with t, so summary statistics should be based on the studentized difference, for example, max[| K 11 (t) −
t
∧ ∧ ∧
K 22 (t)|/sd ( K 11 (t) − K 22 (t))]. The variance can be calculated given t, the number of each type of point,
and the spatial pattern of the combined set of locations.
Use of Ripley’s K(t) functions to examine spatial patterns using the data set described in the entry on
nearest neighbor methods is illustrated. These data are the locations of all 630 trees (stems larger than
11.5 cm diameter at breast height) in a 1-ha plot of swamp hardwood forest in South Carolina, USA. These
trees represent 13 species, but most (over 75%) are black gum, Nyssa sylvatica, water tupelo, Nyssa aquatica,
or bald cypress, Taxodium distichum. Visually (Figure 1 in the article on nearest neighbor methods), most
trees seem to be scattered randomly throughout the plot, but cypress trees appear to be clustered. Ripley’s
K(t) functions provide a way to summarize those spatial patterns, fit models to describe the patterns, and
compare the patterns of different species.
The spatial pattern of all 630 trees and the spatial pattern of the 91 cypress trees can be described
using univariate K(t) statistics. Because the plot is 50 m × 200 m, K(t) is estimated for distances up to
∧
25 m in 0.5 m increments. K (t) for all trees lies above the expected value of Bt2 for all distances between
1 and 10 m, but the large range of the Y axis makes it difficult to see the effects. The patterns are much
clearer in the plot of L(t) − t vs distance (Figure 1a). There is evidence of weak, but statistically significant,
∧
clustering of trees. L (t) − t lies above the upper 97.5% quantile for all distances up to 17 m and above
the expected value of 0 for all distances up to 25 m.
Although the deviation from complete spatial randomness is statistically significant, its magnitude is
small. A biologically relevant summary of the clustering is to compute the proportion of excess trees in
∧ ∧
a specified circle around a randomly chosen tree. This is estimated by K (t)/E [ K (t)] − 1 at a specific
distance t. For the all trees data set, this proportion is small (5.6%) for 6-m radius circles.
∧
For cypress trees, the plot of their L (t) − t (Figure 1b) indicates two different departures from ran-
∧
domness. At very short distances (less than or equal to 2 m), L (t) − t is less than 0, indicating spatial
∧
regularity. At longer distances (greater than or equal to 3 m), L (t) − t is larger than 0, indicating spatial
∧
clustering. The observed L (t) − t curve is much larger than the pointwise 0.975 quantiles for distances
from 4 m to 25 m and both the maximum and mean summary statistics are highly significant (P = 0.001).
This clustering represents a biologically large effect. In a 6-m radius circle, each cypress tree is surrounded
by an estimated 88% more cypress trees than expected if cypress trees were randomly distributed.
The scale of the spatial patterning is clarified by the plot of g(t) (Figure 1c). ĝ(t) is < 1 at 0.5 m and 1.0 m,
suggesting inhibition at short distances, but the uncertainty is large. ĝ(t) is larger than the 0.975 quantile
under CSR for distances from 2.5 m to 12 m, indicating clustering of trees at intermediate distances. The
strongest clustering is at 3 m.
The larger-scale clustering pattern can be described by fitting the K(t) function for a Neyman–Scott
process (4). Parameters are estimated by minimizing the loss function given in (9) for distances from
5 m to 35 m. (Shorter distances are excluded because of no interest in the small-scale spatial inhibition.)
Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved. 7
Ripley’s K Function
∧
Figure 1. L(t) and g(t) plots for swamp trees. (a) Plot of L (t) − t for all 630 trees. Solid horizontal line provides
a reference for L(t) under complete spatial randomness. Dashed lines are 0.025 and 0.975 quantiles of L(t) − t
∧
estimated from 999 simulations. (b) Plot of L (t) − t for 91 cypress trees. Line markings are the same as in (a).
∧
(c) Plot of g(t) for 91 cypress trees. Line markings are the same as in (b). (d) Plot of L (t) − t for 91 cypress trees
fit to a Neyman–Scott process (4). Solid horizontal line provides a reference under complete spatial randomness.
Dotted line is L(t) − t using the estimated parameters. Dashed lines are the 0.025 and 0.975 quantiles of L(t) − t
estimated from 999 simulations of a Neyman–Scott process.
The choices of 5 m and 35 m are arbitrary, but other reasonable values gave similar results. The estimated
∧2 ∧
parameters are the variance of daughter locations, F = 24.1 m2 , and the intensity of mothers, D =
∧
0.0034. The fitted K(t) function is very close to K (t) for distances larger than 5 m (Figure 1d). Pointwise
95% confidence bounds for the fitted K(t) function are computed by repeatedly simulating the Neyman–
∧2 ∧ ∧
Scott process using the estimated F and D , then estimating the 0.025 and 0.975 quantiles of K (t) at each
∧
distance. The observed K (t) curve falls well inside the bounds except at 2 m. This deviation is due to the
small-scale regularity. It is possible to fit a more complicated process that combines small-scale regularity
and larger-scale clustering, similar to the more biologically detailed processes fit by Rathbun and Cressie
[35]
, but the theoretical K(t) function would have to be estimated by simulation [23] .
Even though a Neyman–Scott process describes the spatial pattern quite well, it is inappropriate here
to conclude that it is the mechanism responsible for the clustering. Other mechanisms can lead to exactly
the same pattern [36] . The plot, like most of the swamp, is not a homogeneous environment. In particular,
some areas are above the mean water level, others are in shallow water, and still others are in deep
channels. Cypress are known to be most successful in parts of the swamp with shallow to moderately
deep water. Other trees, for example, black gum, prefer drier areas. The clustering of cypress could simply
be a response to a heterogeneous environment; this hypothesis could be tested if environmental data
such as water depth or elevation were available [37] .
8 Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved.
Ripley’s K Function
Patterns with small-scale regularity and large-scale clustering are quite common for ecological data,
especially when individuals are large, as cypress trees can be. Diameter at breast height for the 91 cypress
trees in the plot ranges from 15 cm to 180 cm, with a median of 105 cm. It is physically impossible for two
median-sized cypress trees to be closer than 1 m. However, this small-scale separation of stems occurs in
conjunction with a larger-scale clustering of individuals into patches. The L(t) and g(t) statistics provide
evidence of both ecological processes.
Visually, cypress and black gum trees appear to be spatially segregated, that is, cypress tend to be
found in patches of mostly cypress and black gum tend to be found in patches of mostly black gum.
This pattern can be described and evaluated using the bivariate K statistics. The subscripts C and G
are used to represent cypress patterns and black gum patterns, respectively. As described above, two
different hypotheses (random labeling and independent processes) could be used to describe the absence
of dependence between cypress and black gum.
Under random labeling, KCC (t) = KCG (t) = KGG (t). If cypress trees tend to occur in patches of other
cypress trees, then KCC > KCG , while if black gums tend to occur in patches of other black gums,
KGG > KGC . Each species can be evaluated by estimating differences of K functions and their uncertainty
∧ ∧∗
under random labeling. The plot of K CC (t) − K CG (t) is above zero and well outside the 95% quantiles
∧ ∧∗
for all distances larger than 3 m (Figure 2a). The plot of K GG (t) − K CG (t) is above zero for all distances
larger than 2 m and well outside the 95% quantiles for all distances larger than 3 m (Figure 2b). Summary
statistics combining tests at all distances are highly significant (P < 0.001). These two species are not
randomly labeled; instead, both are spatially segregated.
The two sets of locations are also not spatially independent. If they were, then KCG (t) = Bt2 at all
distances, t. As with univariate tests of randomness, it is easier to visualize patterns in the equivalent
∧ ∧
L CG (t) − t plot (Figure 2c). For cypress and black gum, L CG (t) − t is less than 0 for all distances and
below the lower 0.025 quantile for most distances larger than 3 m (Figure 2c). The number of black gum
trees in the neighborhood of cypress (or equivalently the number of cypress trees in the neighborhood
∧
of black gums) is less than expected. The observed value of L CG (t) − t under toroidal rotation is not as
extreme a value as those seen under random labeling. The pointwise two-sided P values for the test of
independence range from 0.002 to 0.082 for distances from 3 m to 35 m. The conclusion available here is
that the spatial pattern of cypress trees is not independent of the black gum spatial pattern.
The hypotheses of independent processes and random labeling are not equivalent. However, when
both hypotheses are appropriate, which test is the more powerful? A detailed comparison has not been
∧∗ ∧∗
made, but it is possible to compare distributions of K CG (t) using specific data sets. K CG (t) for random
∧∗
labeling is less variable than K CG (t) for toroidal rotation. This is illustrated using the 0.025 and 0.975
∧∗
quantiles of K CG (t) (Figure 2d). The random labeling quantiles are considerably less extreme than the
toroidal rotation quantiles.
Related Articles
See also Ecological Statistics; Edge effect; Spatial Processes; Spatial Data Analysis; Spatial covariance;
Spatial Autocorrelation Coefficient, Moran’s; Point processes, spatial–temporal.
Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved. 9
Ripley’s K Function
Figure 2. Bivariate K(t) plots to evaluate the spatial relationship between cypress and black gum trees. (a)
∧ ∧
Plot of K CC (t) − K CG (t) for cypress trees. Solid horizontal line at 0 provides a reference for random labeling.
Dotted lines are 0.025 and 0.975 quantiles of L(t) − t estimated from 999 random relabelings. (b) Plot of
∧ ∧
K GG (t) − K CG (t) for black gum trees. Solid horizontal line at 0 provides a reference for random labeling.
Dotted lines are 0.025 and 0.975 quantiles of L(t) − t estimated from 999 random relabelings. (c) Plot of
∧
L CG (t) − t for cypress and black gum trees. Solid horizontal line at 0 provides a reference for independence of
the two spatial processes. Dotted lines are 0.025 and 0.975 quantiles of L(t) − t estimated from 999 random
toroidal shifts. (d) Comparison of 0.025 and 0.975 quantiles computed by random labeling (dotted lines) and
random toroidal shifts (dashed lines).
10 Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved.
Ripley’s K Function
References
[1] Duncan, R.P. & (1993). Testing for life historical changes in spatial patterns of four tropical tree species in Westland,
New Zealand, Journal of Ecology 81, 403–416.
[2] Peterson, C.J. & Squiers, E.R. (1995). An unexpected change in spatial pattern across 10 years in an Aspen-White Pine
forest, Journal of Ecology 83, 847–855.
[3] Stoyan, D. & Penttinen, A. (2000). Recent applications of point process methods in forestry statistics, Statistical Science
15, 61–78.
[4] Stamp, N.E. & Lucas, J.R. (1990). Spatial patterns and dispersal distance of explosively dispersing plants in Florida
sandhill vegetation, Journal of Ecology 78, 589–600.
[5] Bayard, T.S. & Elphick, C.S. (2010). Using spatial point-pattern assessment to understand the social and environmental
mechanisms that drive avian habitat selection, Auk 127, 485–494.
[6] Diggle, P.J. (1986). Displaced amacrine cells in the retina of a rabbit: analysis of a bivariate spatial point pattern, Journal
of Neuroscience Methods 18, 115–125.
[7] Dancause, N., Barbay, S., Frost, S.B., Plautz, E.J., Popescu, M., Dixon, P.M., Stowe, A.M., Friel, K.M., & Nudo, R.J. (2006)
Topographically divergent and convergent connectivity between premotor and primary motor cortex, Cerebral Cortex 16,
1057–1068.
[8] Diggle, P.J., & Chetwynd, A.G. (1991). Second-order analysis of spatial clustering for inhomogeneous populations,
Biometrics 47, 1155–1163.
[9] Cressie, N.A.C. (1991). Statistics for Spatial Data, John Wiley & Sons, New York.
[10] Diggle, P.J. (2003). Statistical Analysis of Spatial Point Patterns, 2nd Edition, Arnold, London.
[11] Illian, J., Penttinen, A., Stoyan, H., & Stoyan, D. (2008). Statistical Analysis and Modelling of Spatial Point Patterns, John
Wiley & Sons Ltd, Chichester.
[12] Möller, J. & Waagepetersen, R.P. (2004). Statistical Inference and Simulation for Spatial Point Processes, Chapman &
Hall/CRC, Boca Raton.
[13] Ripley, B.D. (1981). Spatial Statistics, John Wiley & Sons, New York.
[14] Baddeley, A. , Gregori, P. , Mateu, J. , Stoica, R. , & Stoyan, D. , eds. (2006). Case Studies in Spatial Point Process Modeling,
Vol. 185, Lecture Notes in Statistics, Springer, Berlin.
[15] Bivand, R.S., Pebesma, E.J., & Gómez-Rubio, V. (2008). Applied Spatial Data Analysis with R, Springer, New York.
[16] Ripley, B.D. (1976). The second-order analysis of stationary point processes, Journal of Applied Probability 13, 255–266.
[17] Ripley, B.D. (1977). Modelling spatial patterns, Journal of the Royal Statistical Society, Series B 39, 172–192.
[18] Baddeley, A.J. & Silverman, B.W. (1984). A cautionary example on the use of second-order methods for analyzing point
patterns, Biometrics 40, 1089–1093.
[19] Lotwick, H.W. (1984). Some models for multitype spatial point processes, with remarks on analysing multitype patterns,
Journal of Applied Probability 21, 575–582.
[20] Matern, B. (1986). Spatial Variation, Vol. 36, 2nd Edition, Lecture Notes in Statistics, Springer-Verlag, Berlin.
[21] Neyman, J. & Scott, E.L. (1952). A theory of the spatial distribution of galaxies, Astrophysical Journal, 116, 144–163.
[22] Baddeley, A., Möller, J., & Waagepetersen, R. (2000). Non- and semi-parametric estimation of interaction in inhomoge-
neous point patterns, Statistica Neerlandica 54, 329–350.
[23] Diggle, P.J. & Gratton, R.J. (1984). Monte Carlo methods of inference for implicit statistical models (with discussion),
Journal of the Royal Statistical Society, Series B 46, 193–227.
[24] Strauss, D.J. (1975). A model for clustering, Biometrika 62, 467–475.
[25] Isham, V. (1984). Multitype Markov point processes: some applications, Proceedings of the Royal Society of London, Series
A 391, 39–53.
[26] Gaines, K.F., Bryan, A.L., & Dixon, P.M. Jr (2000). The effects of drought on foraging habitat selection in breeding wood
storks in coastal Georgia, Waterbirds 23, 64–73.
[27] Doguwa, S.I. & Upton, G.J.G. (1989). Edge-corrected estimators for the reduced second moment measure of point
processes, Biometrical Journal 31, 563–576.
[28] Ripley, B.D. (1979). Tests of ‘randomness’ for spatial point patterns, Journal of the Royal Statistical Society, Series B 41,
368–374.
[29] Besag, J. & Diggle, P.J. (1977). Simple Monte Carlo tests for spatial pattern, Applied Statistics 26, 327–333.
[30] Wiegand, T., Moloney, K.A., Naves, J., & Knauer, F. (1999). Finding the missing link between landscape structure and
population dynamics: a spatially explicit perspective, The American Naturalist 154, 605–627.
[31] Hanisch, K.H. & Stoyan, D. (1979). Formulas for second-order analysis of marked point processes, Mathematische
Operationsforschung und Statistik, Series Statistics 14, 559–567.
Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved. 11
Ripley’s K Function
[32] Lotwick, H.W. & Silverman, B.W. (1982). Methods for analysing spatial processes of several types of points, Journal of
the Royal Statistical Society, Series B 44, 406–413.
[33] Cuzick, J. & Edwards, R. (1990). Spatial clustering for inhomogeneous populations (with discussion), Journal of the Royal
Statistical Society, Series B 52, 73–104.
[34] Tscheschel, A. & Stoyan, D. (2006). Statistical reconstruction of random point patterns, Computational Statistics & Data
Analysis 51, 859–871.
[35] Rathbun, S.L. & Cressie, N. (1994). A space–time survival point process for a longleaf pine forest in southern Georgia,
Journal of the American Statistical Association 89, 1164–1174.
[36] Bartlett, M.S. (1964). The spectral analysis of two-dimensional point processes, Biometrika 51, 299–311.
[37] Rathbun, S.L. (1996). Estimation of Poisson intensity using partially observed concomitant variables, Biometrics 52,
226–242.
12 Copyright c 2013 John Wiley & Sons, Ltd. All rights reserved.