SAMPLING IN FIELD EXPERIMENTS
Seema Jaggi
I.A.S.R.I., Library Avenue, New Delhi- 11 0012
In agricultural field experiments, the size of the plot is selected in order to achieve a
prescribed degree of precision for measurement of the character of primary interest. We
then measure the character under study on the whole of the experimental unit i.e. plot.
Because of the nature of the character of primary interest like yield, the plot size required
is often larger than that needed to measure other characters. In order to save expense and
time the measurements of additional characters of interest can be made by sampling a
fraction of the whole plot. For example, for plant height, the measurements can be made
only from say 10 of the 200 plants in the plot, for tiller number, count only 1 m2 of the
15 m2 plot, for leaf area, measure from only 20 of the approximately 2000 leaves in the
plot. For such cases like plant height, leaf area etc. it may not be always feasible or
desirable to get the plot wise measurements. Here we resort to sampling in each plot and
obtain the measurements on a certain number of sampling units in each plot and subject
the data for statistical analysis.
An appropriate sample is one that provides an estimate, or a sample value, that is as close
as possible to the value that would have been obtained had all plants in the plot been
measured - the plot value. The difference between the sample value and the plot value
constitutes the sampling error. Thus a good sampling technique is one that gives small
sampling error.
The sampling unit is the unit on which actual measurement is made. The important
features of an appropriate sampling unit are:
• Ease of Identifications
• Ease of Measurement
• High Precision
• Low Cost
The number of sampling units taken from the population is sample size. In a replicated
field trial where each plot is a population, sample size could be the number of plants per
plot used for measuring plant height, the number of leaves per plot used for measuring
leaf area, or the number of hills per plot used for counting tillers. The required sample
size for a particular experiment is governed by:
(i) The size of the variability among sampling units within the same plot (sampling
variance)
(ii) The degree of precision desired for the character of interest.
In practice, the size of the sampling variance for most plant characters is generally not
known. The desired level of precision can, however, be prescribed by the researcher
based on experimental objective and previous experience, in terms of the margin of error,
either of the plot mean or of the treatment mean.
Plot Sampling
The sample size for a simple random sampling design that can satisfy a prescribed margin
of error of the plot mean is computed as:
( Zα2 )(vs )
n=
(d 2 )( X 2 )
where n is the required sample size, Zα is the value of the standardized normal variate
corresponding to the level of significance α , vs is the sampling variance, X is the mean
value, and d is the margin of error expressed as a fraction of the plot mean.
The information of primary interest to the researcher is usually the treatment means (the
average over all plots receiving the same treatment) or actually the difference of means,
rather than the plot mean (the value from a single plot). Thus, the desired degree of
precision is usually specified in terms of the margin of error of the treatment mean rather
than of the plot mean. In such a case, sample size is computed as:
(Zα2 )(vs )
n=
r(D 2 )(X 2 ) − (Zα2 )(v p )
where n is the required sample size, r is the number of replications, Z α and vs are as
defined earlier, vp is the variance between plots of the same treatment (i.e. experimental
error), and D is the prescribed margin of error expressed as a fraction of the treatment
mean. In this case, additional information on the size of the experimental error (vp) is
needed to compute sample size.
A sampling design specifies the manner in which the n sampling units are to be selected
from the whole plot. There are five commonly used sampling designs in replicated field
trials: simple random sampling, multistage random sampling, stratified random sampling,
stratified multistage random sampling, and sub-sampling with an auxiliary variable.
In a simple random sampling design, there is only one type of sampling unit and, hence,
the sample size (n) refers to the total number of sampling units to be selected from each
plot consisting of N units. The selection of the n sampling units is done in such a way
that each of the N units in the plot is given the same chance of being selected in plot
sampling, two of the most commonly used random procedures for selecting n sampling
units per plot are the random-number technique and the random - pair technique.
In contrast to the simple random sampling design, where only one type of sampling unit
is involved, the multistage random sampling design is characterized by a series of
sampling stages. Each stage has its own unique sampling unit. This design is suited for
cases where the sampling unit is not the same as the measurement unit. For example, in
a rice field experiment, the unit of measurement for panicle length is a panicle and that
for leaf area is a leaf. The use of either the panicle or the leaf as the sampling unit,
however, would require the counting and listing of all panicles or all leaves in the plot
which is time-consuming task that would definitely not be practical.
550
Plot Sampling
The stratified random sampling design is useful where there is large variation between
sampling units and where important sources of variability follow a consistent pattern. In
such cases, the precision of the sample estimate can be improved by grouping the
sampling units into different strata in such a way that variability between sampling units
within a stratum is smaller than that between sampling units from different strata. Some
examples of stratification criterion used in agricultural experiments are as follows:
• Soil Fertility Pattern. In an insecticide trial where block is based primarily on the
direction of insect migration, known patterns of soil fertility cause substantial
variability among plants in the same plot. In such a case, a stratified random
sampling design may be used so that each plot is first divided into several strata based
on the known fertility patterns and sample plants are then randomly selected from
each stratum.
• Stress Level. In a variety screening trial for tolerance for soil salinity, areas within
the same plot may be stratified according to the salinity level before sample plants are
randomly selected from each stratum.
• Within-Plant Variance. In a rice hill, panicles from the taller tillers are generally
larger than those from the shorter ones. Hence, in measuring such yield components
as panicle length or number of grains per panicles, panicles within a hill are stratified
according to the relative height of the tillers before sample panicles are randomly
selected from each position (or stratum).
Stratified multistage random sampling: Consider the case where a rice researcher
wishes to measure the average number of grains per panicle through the use of a two-
stage sampling design with individual hills in the plot as the primary sampling unit and
individual panicles in a hill as the secondary sampling unit. It is realized that the number
of grains per panicle varies greatly between the different panicles of the same hill. A
logical alternative is to apply the stratification technique by dividing the panicles in each
selected hill (i.e., primary sampling unit) into k strata, based on their relative position in
the hill, before a simple random sample of m panicles from each stratum is taken
separately and independently for the k strata. In this case, the sampling technique is
based on a two-stage sampling design with stratification applied on the secondary unit.
Of course, instead of the secondary unit (panicles) the researcher could have stratified the
primary unit (i.e., single-hill) based on any source of variation pertinent to his
experiment. In that case, the sampling technique would have been a two-stage sampling
design with stratification of the primary unit. Or, the researcher could have applied both
stratification criteria -one on the hills and another on the panicles-and the resulting
sampling design would have been a two-stage sampling with stratification of both the
primary and secondary units.
Sub-sampling with an auxiliary variable. The main features of a design for
subsampling with an auxiliary variable are:
• In addition to the character of interest , say X, another character, say Z, which is
closely associated with and is easier to measure than X, is chosen.
551
Plot Sampling
• Character Z is measured both on the main sampling unit and on the subunit, whereas
variable X is measured only on the subunit. The subunit is smaller than the main
sampling unit and is embedded in the main sampling unit.
This design is usually used when the character of interest, say X, is so variable that the
large size of sampling unit or the large sample size required to achieve a reasonable
degree of precision or both, would be impractical. To improve the precision in the
measurement of X , without unduly increasing either the sample size or the size of
sampling unit, the subsampling with an auxiliary variable design can be used.
Supplementary Techniques
So, far, we have discussed sampling techniques for individual plots, each of which is
treated independently and without reference to other plots in the same experiment.
However, in a replicated field trial where the sampling technique is to be applied to each
and all plots in the trial, a question usually raised is whether the same set of random
sample can be repeated in all plots or whether different random processes are needed
for different plots. And, when data of a plant character are measured more than once
over time, the question is whether the measurements should be made on the same
samples at all stages of observation or should randomization be applied.
The two techniques aimed at answering these questions are block sampling and
sampling for repeated measurements.
Block Sampling is a technique in which all plots of the same block (i.e. replication ) are
subjected to the same randomization scheme (i.e. using the same sample location in the
plot) and different sampling schemes are applied separately and independently for
different blocks. The block sampling technique has the following desirable features:
• Randomization is minimized. With block sampling randomization is done only r
times instead of rt times as it is when randomization is done separately for each and
all plots.
• Data collection is facilitated. With block sampling, all plots in the same block have
the pattern of sample locations so that an observer (data collector) can easily move
from plot to plot within a block without the need to reorient himself to a new
pattern of sample location.
• Uniformity between plots of the same block is enhanced because there is no added
variation due to changes in sample location from plot to plot.
Data collection by block is encouraged. For example, if data collection is to be done by
several persons, each can be conveniently assigned to a particular block which facilitates
the speed and uniformity of data collection . Even if there is only one observer for the
whole experiment, he can complete the task one block at a time, taking advantage of
the similar sample locations of plots in the same block and minimizing one source of
variation among plots, namely, the time span in data collection.
Sampling for Repeated Measurements : Plant characters are commonly measured at
different growth stages of the crop. For example, tiller number in rice may be measured
at 30,60,90 and 120 days after transplanting or at the tillering, flowering, and
552
Plot Sampling
harvesting stages. If such measurements are made on the same plants at all stages of
observation , the resulting data may be biased because plants that are subjected to
frequent handling may behave differently from others. In irrigated wetland rice, for
example , frequent trampling around plants, or frequent handling of plants not only
affect the plant characters being measured but also affect the plants’ final yields. On
the other hand, the use of an entirely different set of sample plants at different growth
stages could introduce variation due to differences between sample plants. The partial
replacement procedure provides for a satisfactory compromise between the two
conflicting situations. With partial replacement, only a portion p of the sample plants
used in one growth stage is retained for measurement in the succeeding stage. The
other portion of (1-p) sample plants is randomly obtained from the remaining plants in
the plot. The size of p depends on the size of the estimated undesirable effect of
repeated measurements of the sample plants in a particular experiment. The smaller this
effect, the larger p should be. For example, in the measurement of plant height and
tiller number in transplanted rice, p is usually about 0.75. That is, about 75% of the
sample plants measured at given growth stage is retained for measurement in the
succeeding stage and the remaining 25% is obtained at random from the other plants in
the plot.
Analysis
The various steps involved in the analysis of sampled data is described here considering a
block design setting. Suppose an experiment is conducted with ‘t’ treatments replicated
‘r’ times and let there be ‘n’ observations made in each plot. We assume the following
linear additive model for the block design.
Yijk = µ + τi + βj + eij + ηijk
where Yijk is the observation on the kth sample for the ith treatment in the jth replicate (i =
1,2,...,t ; j = 1,2,...,r; k = 1,2,...,n), µ is the general mean effect, τi is the effect of ith
treatment, βj is the effect of jth replication, eij is the plot error distributed as N(0 , σ e ),
2
ηijk is the sampling error distributed as N(0 , σ s ).
2
The analysis of variance will be of the form given below :
ANOVA
Source D.F. S.S M.S. E(M.S.)
Replication (r-1) SST
rn
σ s + nσ e + ∑ (τ i − τ .) 2
2 2
Treatments (t-1) SSR t −1 j
Treatment x Replication (t-1) (r-1) SSRT s12 σ s2 + nσ e2
(Plot error)
Sampling Error rt(n-1) SSE s22 σ 2s
(Samples within plots)
Total rtn-1
553
Plot Sampling
The sampling error is estimated as σ̂ s2 = s22 .
s12 − s 22
The plot error is estimated as σˆ e2 = .
n
When σ̂ e2 is negative, it is taken as zero.
The variance of the ith treatment mean ( Yi.. ) based on r-replications and s-samples per
σ s2 + nσ e2
plot =
rn
(σˆ s2 + nσˆ e2 )
The estimated variance of ( Yi.. ) =
rn
Taking the number of sampling units in a plot to be large (infinite), the estimated
variance of a treatment mean when there is complete recording (i.e. the entire plot is
σ̂ 2
harvested) = e
r
The efficiency of sampling as compared to complete recording
σˆ e2 / r
(σˆ s2 + nσˆ e2 ) / rn
The standard error of a treatment mean ( Yi.. ) with ‘n’ samples per plot and with ‘r’
replications is
1/ 2
⎡σˆ s2 σˆ e2 ⎤
⎢ + ⎥
⎢⎣ rn r ⎥
⎦
The percentage standard error or coefficient of variation is
⎡⎛ 2 2 ⎞1 / 2 ⎤
⎢ σˆ σˆ
p= ⎜ s
+ e ⎟ (Yi.. ) ⎥ x100
⎢⎜ rn r ⎟⎠ ⎥
⎢⎣⎝ ⎥⎦
Thus
⎡ ⎤
⎢ ⎥
σˆ s2 ⎢ 1 ⎥
n = ⎢ 2 ˆ ⎥
⎢ p (Yi.. ) − σ e ⎥
r 2 2
⎢⎣ (100)2 r ⎥⎦
For any given r and p, there will be t values for s corresponding to the t treatment means.
The maximum s will ensure the estimation of any treatment mean with a standard error
not exceeding p percent.
The sum of squares due to different components of ANOVA can be obtained as follows :
554
Plot Sampling
Form a two way table between replications and treatments , each cell figure being the
total over all samples from a plot.
(G.T.)2
Grand Total (G.T.) = ∑ ∑ ∑ y ijk , Correction factor (C.F.)=
i j k
rtn
2
⎛ ⎞
Total S.S. = ∑∑ ⎜ ∑ y ijk ⎟ n − C . F.
⎜ ⎟
i j ⎝ k ⎠
Ti = i treatment total = ∑ ∑ y ijk
th
j k
Rj = j th
replication total = ∑ ∑ y ijk
i k
Ti2 R 2j
Treatment S.S. = ∑ − C.F . , Replication S.S. = ∑ − C.F
i rn j tn
Replication x Treatment S.S. = Total S.S. - Replication S.S -Treatment S.S.
Total S.S. of the entire data = ∑ ∑ ∑ y ijk
2
− C .F .
i j k
S.S. due to sampling error = Total S.S. of the entire data - Replication S.S. -
Treatment S.S. - Replication x Treatment S.S.
Exercise: To study the effect of differences in the number of plants per hill on the growth
of Maize crop, a randomized block design was laid out at the Agricultural College Farm,
Poona. The treatments tried were A - one plant per hill, B - two plants per hill, C - three
plants per hill, D - four plants per hill.
The net plot size used in the layout was 26’ x 20’ and the spacing between hills was 2’ x
2’. The table below gives the data on the length (in inches) of 5 cobs randomly selected
from each plot:
555
Plot Sampling
Length of cobs (in inches)
Replication Cob Treatments
number
A B C D
I 1 9.3 9.0 8.6 6.4
2 8.8 9.0 7.0 7.2
3 9.0 10.5 8.4 6.8
4 8.8 8.9 9.1 7.7
5 8.6 9.2 8.2 6.0
II 1 10.2 9.7 9.0 6.4
2 9.0 10.0 8.0 7.4
3 9.4 9.2 8.1 6.8
4 9.6 10.5 8.2 6.8
5 9.8 10.3 7.0 6.6
III 1 9.9 8.4 7.5 6.3
2 10.4 9.4 7.5 6.7
3 11.0 8.2 8.5 6.0
4 10.8 9.1 8.0 7.0
5 10.0 9.8 8.6 7.3
IV 1 10.6 8.8 7.0 8.4
2 9.2 9.3 7.3 7.8
3 9.9 9.9 7.6 8.0
4 10.4 9.0 6.7 8.4
5 9.9 8.0 6.5 7.5
V 1 10.4 11.0 9.9 7.7
2 9.0 10.4 9.0 7.0
3 9.7 9.0 8.9 7.0
4 9.3 10.2 8.9 6.7
5 9.6 9.6 9.4 7.2
(a) Analyze the data and find the standard error of treatment means.
(b) Estimate the plot and sampling components of error variance and use these estimates
to find out the relative efficiency of sampling.
(c) Prepare a table giving the minimum number of sampling units per plot necessary to
estimate the treatment means with 4 and 5 per cent standard error when the number of
replications are 5 and 6.
Calculations
Step 1: Form the following two way table between replications and treatments, each cell
figure being the total of cob lengths in five samples from a plot.
556
Plot Sampling
Replication Treatments Total
A B C D
I 44.5 46.6 41.3 343.1 166.5
II 48.0 49.7 40.3 34.0 172.0
III 52.1 44.9 40.1 33.3 170.4
IV 50.0 45.0 35.1 40.1 170.2
V 48.0 50.2 46.1 35.6 179.9
Total 242.6 236.4 202.9 177.1 859.0
Step 2: Calculation of sum of squares and Analysis of variance.
The various sum of squares can be obtained using the formulae given above and the
Analysis of Variance table can be obtained.
ANOVA
Source D.F. S.S. M.S. F
Replication 4 4.91 1.23 0.59
Treatment 3 112.09 37.36 18.05**
Replication x Treatment 12 24.88 2.07 6.68*
(plot error)
Samples within plots 80 24.91 0.31
(Sampling error )
Total 99 166.79
** denotes significant at 1 percent level and * significant at 5 percent level.
2 2 2
The mean square (s 1 ) is first tested against s 2 if - (i) s 1 is significant, then treatments
2 2
are tested against s 1 and if -(ii) s 1 is not significant , the treatments are tested against
2 2
the pooled mean square of s 1 and s 2 .
2 2
In the present case s 1 is significant, so we test the treatments against s 1 .
Step 3: Standard Error
Standard Error of the difference between two treatment means
s12 2 x 2.07
S.Ed = = = 0.4069 inches.
rn 5x5
Step 4: Efficiency
s12 − s 22 2.07 − 0.31
σˆ e =
2
= = 0.3520
n 5
σˆ s2 = s 22 = 0.31
The estimated variance of
557
Plot Sampling
σˆ s2 σˆ e2 2.070
Yi.. = + = = 0.0828
rn r 25
σ̂ 0.352 2
Estimated variance in case of complete recording = e = = 0.0704.
r 5
Efficiency of sampling as compared to complete recording
σˆ e2 / r
= 0 .85
(σˆ s2 + nσˆ e2 ) / rn
Step 5: Estimation of sampling units per plot
⎧ ⎫
⎪ ⎪
σˆ s2 ⎪ 1 ⎪
s= ⎨ 2 ⎬
r ⎪ p (Yi.. ) 2
σˆ e ⎪
2
⎪ (100)2 − r ⎪
⎩ ⎭
Thus the number of sampling units required to measure the treatment means with 4 and 5
per cent standard error when the number of replication are 5 and 6 is worked out and is
presented below.
Sampling units per plot (s)
Treatments Treatment p=4 p=5
means r=5 r=6 r=5 r=6
1 9.704 1 1 1 1
2 9.456 1 1 1 1
3 8.116 2 2 1 1
4 7.084 5 3 2 1
Step 6: Conclusion
(a) The treatments are found to be highly significant.
(b) Efficiency of sampling as compared to complete recording is 85 per cent.
(c) The number of sampling units necessary to estimate treatment means with
(i) 4 per cent standard error
when number of replications is 5 is 5,
when number of replications is 6 is 3.
(ii) 5 per cent standard error
when number of replications is 5 is 2,
when number of replications is 6 is 1.
References
Nigam, A.K. and Gupta, V.K. (1979). Handbook on Analysis of Agricultural
Experiments. IASRI, New Delhi.
Gomez, K.A. and Gomez, A.A. (1976). Statistical Procedures for Agricultural Research.
John Wiley & Sons, New York.
558