Applications of Sampling Theory
Applications of Sampling Theory
www.elsevier.com/locate/chemolab
Received 1 August 2003; received in revised form 1 January 2004; accepted 12 March 2004
Available online 28 July 2004
Abstract
A large number of analyses is carried out, e.g., for process control, product quality control for consumer safety, and environmental control
purposes. The sampling theory developed by Pierre Gy, together with the theory of stratified sampling, can be used to audit and optimize
analytical measurement protocols. A careful optimization of the sampling and measurement steps of the complete analytical procedure may
result in considerable savings in costs or in improvement of the reliability of results.
D 2004 Elsevier B.V. All rights reserved.
0169-7439/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.chemolab.2004.03.013
86 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94
Example 1.
3. Applications of fundamental sampling error model Plant Manager: I am producing fine-ground limestone that
is used in paper mills for coating printing paper. According
Fundamental sampling error is the minimum error of an to their specification, my product must not contain more
ideal sampling procedure. Ultimately, it depends on the than 5 particles/tonne particles larger than 5 Am. How
number of critical particles in the samples. For homoge- should I sample my product?
neous gases and liquids, it is very small, but for solids, Sampling Expert: That is a bit too general a question. Let’s
powders, and particulate materials, especially at low con- first define our goal. Would 20% relative standard deviation
centrations of critical particles, the fundamental error can for the coarse particles be sufficient?
be very large. If the lot to be sampled can be treated as Plant Manager: Yes.
P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94 87
error achievable in a sampling step. Therefore, the funda- The total relative standard deviation can now be
mental sampling error calculations give realistic estimates estimated by applying the rule of propagation of errors:
for the global sampling error, only if the material is well-
qffiffiffiffiffiffiffiffiffiffiffiffi
X
mixed before sampling, and all sampling and subsampling
st ¼ s2ri ¼ 0:143 ¼ 14:3%
procedures are carried out with equipment and methods
that follow the rules of sampling correctness defined in
The largest error is generated in preparing the 2-g
Gy’s sampling theory. Therefore, if large lots are sampled,
sample for the extraction of the enzyme. To improve the
the uncertainty of the primary samples has to be estimated
overall precision, this step should first be modified. The
in a different way, e.g., by using Gy’s variographic
recommendation from this exercise is that either a larger
method.
sample should be used for the extraction, or the primary
sample should be pulverized to a finer particle size before
Example 2. A certain cattle feed (density = 0.67 g/cm3)
secondary sampling, whichever is more economic in
contains as an average 0.05% of an enzyme powder that has
practice.
a density of 1.08 g/cm3. The size distribution of the enzyme
was available, and from it was estimated that the
Example 3. Evaluate the feasibility of the following
characteristic particle size d = 1.00 mm and the size factor
procedure for calibrating an IR spectrometer for the
g = 0.5. Estimate the fundamental sampling error for the
determination of quartz in mineral mixtures. To prepare
following analytical procedure.
the calibration standards, pure minerals (d = 1 mm) were
The actual concentration of a 25-kg bag is estimated by
ground individually for 2 min in a swing mill. Then 30
taking first a 500-g sample from it. This material is ground
mg –2.95 g of each mineral was carefully weighed to
to particle size 0.5 mm. Then, the enzyme is extracted
obtain the designed composition. The material was
from a 2-g sample by using a proper solvent, and the
carefully mixed for 3 min in a Retsch Spectro Mill, and
concentration is determined by using liquid chromatogra-
20 mg of the mineral mixture was carefully weighed into
phy. The relative standard deviation of the chromatographic
4.98 g of KBr and mixed for 3 min in a Retsch Spectro
measurement is 5%.
Mill. Mineral – KBr mixture (200 mg) was pressed into a
To estimate the errors of the two sampling steps, we have
tablet for the IR measurement. It was evaluated that the
the following material properties:
size of the IR beam covered 38% of the area of the sample
tablet. The method was developed for quartz concen-
trations from 1% to 10%.
M1 = 500 g Dilution factor in 0.2 g/5.0 g = 0.004 is needed to
ML1 = 25000 g evaluate aL, the concentration of quartz in KBr tablets.
d1 = 0.1 cm
g1 = 0.5
The procedure has three steps generating sampling errors.
aL = 0.05% These are
a = 100%
qc = 1.08 g/cm3 (1) taking the 20-mg mineral sample from the homogenized
qm = 0.67 g/cm3 mineral mixture to be mixed in KBr:
f = 0.5 . . .default value for spheroidal particles
b=1 . . .liberated particles
lot size = ML1 = 5 g
M2 = 2.0 g . . .sample sizes sample size = MS1 = 0.02 g
ML2 = 500 g . . .lot sizes (2) the calibration tablet preparation:
d2 = 0.05 cm . . .particle sizes lot size = ML2 = 5 g
g2 = 0.25 . . .estimated size distribution factors sample size = MS2 = 0.2 g
(3) IR measurement:
lot size = ML3 = 200 mg
These values give for constitution factor (Eq. (6)) the value sample size = MS3 = 38% of 0.2 g = 76 mg
c = 2160 g/cm3, and for the sampling constants (Eq. (5)), C
values Following material properties were estimated:
Fig. 3. Standard deviation estimates obtained in Example 3 for the three 4.1.3. Analytical samples
sampling steps (1 – 3) and for the total standard deviation (4) in the At this level, n3 is the number of analytical samples
calibration of the IR spectrometer for quartz determination. prepared from each primary sample, N3 is the size of the
primary sample, as the number of potential analytical
samples that could be prepared from it, r3 is the standard
Results of the fundamental sampling error estimation deviation of the preparation of the analytical samples (i.e.,
are shown as function of the quartz in the original the standard deviation between analytical samples taken
sample mixture without dilution with KBr in Fig. 3, for from a primary sample), and c3 is the unit cost of preparing
each step separately and for the total three-step calibration an analytical sample.
procedure. For optimization purposes, the unit costs, ci, can be
given either as currency units or as relative costs, e.g., as
time required to carry out the given sampling operation.
4. Optimization of sampling plans based on stratified Because the strata and the units within the strata are in
sampling most cases autocorrelated, especially in process analysis,
and the sampling variances depend on sampling strategy
Theory of stratified sampling can be used to optimize (systematic, stratified, or random selection), the variances
sampling plans. This subject has been treated, e.g., by should in general be estimated by using Gy’s vario-
Sommer [5] and Cochran [7]. Sommer’s approach is graphic method. Analysis of variance based on the
followed in this presentation. Sometimes, stratification is design shown in Fig. 4 is recommended in many
natural, e.g., if the lot to be investigated consists of bags, statistical textbooks as the method for estimating the
containers, wagon loads, etc. In sampling process streams, variance components. Because this method does not take
where no clear stratum borders can be found, the strata autocorrelation into account, it should be used only in
can be selected by the sampler. As Gy and Sommer [5] case that strictly random sample selection is used (not
have shown, stratified sampling usually gives smaller recommended), or there is no autocorrelation between the
uncertainties for the mean value, at worst equal to random sampling units.
sampling.
4.1.1. Lot
The lot consists of N1 strata (sublots) of equal
sizes. Of these, n1 strata are selected from which the Fig. 4. Nested sampling plan.
90 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94
Because the strata have equal sizes, the mean of the lot By substituting the values for ni (i>1) in Eq. (9), n1 can
can be calculated as the unweighted mean of the analytical now be solved
results
cmax
n1 ¼ ; rounded to the lower integer
c 1 þ n2 c 2 þ n2 n3 c 3
Total number of samples analyzed : nt ¼ n1 n2 n3 ð6Þ
1Vn1 VN1 ð11Þ
1 X n1 X
n2 X
n3 4.1.3.2. Target value for the variance of the mean fixed,
Mean of the lot : x̄ ¼ xijk ð7Þ total cost minimized. If rT2 is the target value of the
n1 n2 n3 i j k
variance of the mean of the procedure, then the protocol
has to be designed so that the variance of the mean of the
Variance of the lot mean: selected procedure, r2x̄ V rT2.
For levels i >1, Eq. (10) also applies in this case. Number
N1 n1 r21 N2 n2 r22 N3 n3 r23 of sublots to be sampled at level 1 can now be solved either
r2x̄ ¼ þ þ from Eq. (8a) (exact solution, Eq. (12a)) or from Eq. (8b)
N 1 1 n1 N2 1 n1 n2 N3 1 n1 n2 n3
(approximate solution, Eq. (12b), when all NiHni)
ð8aÞ
N1 N2 n2 r22 N3 n3 r23
Eq. (8a) shows that if a sample can be taken from r21 þ þ
N 1 N2 1 n2 N3 1 n2 n3
every stratum (N1 = n1), the between-strata variance is n1 ¼ 1 2
ð12aÞ
r1
completely eliminated from the variance of the mean. r2T þ
On the other hand, if at all levels, the sample is small N1 1
in comparison to the lot, from which it is taken, this r21 þ r22 þ r23
equation simplifies to n1 ¼ ð12bÞ
r2T
X
k
ct ¼ ni c i ð16aÞ
i¼1
ct ¼ nt c*; if c1 ¼ c2 ¼; . . . ; ¼ ck ¼ c* ð16bÞ
Fig. 5. Operation Characteristics of the optimized inspection protocol for
cobalt determination. In this figure x-axis shows the true mean value of the
lot, and y-axis, the probability that the mean value of the inspection exceeds
Optimization of investigation involves the optimal allo-
the specification 150 g/tonne (producer’s risk) or the probability that the cation of the total number of samples that can be analyzed
inspection value is below the specification (buyer’s risk). between the strata in an optimal way. Mathematical opti-
92 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94
mum can again be derived by assuming that ni is continu- Example 5 (Optimal design for estimating sulfur balance of
ous. The results given below have been derived assuming a pulp mill). Estimation of sulfur balance for a pulp mill is a
that investigation costs are independent from strata (Eq. difficult task Fig. 7. Sulfur enters the mill in raw materials
(16b) valid). Optimization strategy depends on how much (water, wood) and in chemicals. The outflowing streams
information is available. If only the sizes of strata are known consist of products, wastewater, solid wastes, and atmo-
and the total cost ct of the investigation is fixed, then the spheric emissions. Initial calculations showed that the mean
best strategy is to allocate the samples proportionally to the values of all other streams, except the emissions into
sizes of strata: atmosphere, could be estimated reliably. Atmospheric
emissions comprised about one quarter of the total sulfur.
ni ¼ W i nt ð17Þ Estimation of atmospheric sulfur emissions in an old pulp
mill, like the one where this study was carried out, is
and difficult and therefore optimization of the emission
ct measurement plan is a challenging task. This is due to the
nt ¼ ð18Þ fact that there is a large number of gaseous outlets into
c*
atmosphere. These have different mass-flows, concentra-
Both nt and ni have to be rounded to integers so that the total tions of the sulfur compounds are highly variable and sulfur
cost will not be exceeded. is found in dust and in many gaseous compounds [SO2,
If the unit costs and standard deviations are available, H2S, CH3S, (CH3)2S, and (CH3)2S2]. In optimization, all
then even better plans can be designed. Laboratories usually different emission sources and sulfur-containing compounds
follow their costs and, consequently, good cost estimates are analyzed had to be treated as separate strata.
available. The standard deviations can be estimated either
by using Gy’s sampling theory, if the material properties
needed are available, or experimentally from a pilot study. If 4.2.2.1. Design requirements.
the quality control of the analytical laboratory is well
Material balance for sulfur required annually
planned, it also provides data that can be used for optimi-
A two-man team with portable instruments available for
zation of sampling and analytical procedures.
Cost optimal plan for the investigation can be derived in atmospheric emission control measurements
To avoid too much time spent in transporting and setting
two ways. Either the total cost of the investigation is fixed
and the variance of the grand mean of the lot is minimized up the instruments, a one-week measurement period is
or the target value for the variance is given and the total cost carried out at each location, where the team is working;
is minimized. By assuming that Eqs. (15b) and (16b) are that is, 52 periods/year are available and should be
valid, that is, in all strata the samples are small in compar- allocated optimally
For optimization, the emission sources were grouped into
ison to the sizes of strata, and the cost of investigation is
independent of strata, the following results can be derived. six groups, which could be measured during 1 week from
one station in the field
4.2.1. Maximum value, cmax, given to the total cost,
variance of the lot mean minimized 4.2.2.2. Acquisition of basic data. Existing records sup-
plemented with new experiments were used to estimate:
Wi ri cmax
ni ¼ ð19Þ
X
k c*
Wi r i
i¼1
Wi ri X
k
ni ¼ Wi ri ð20Þ
r2T i¼1 Fig. 7. To estimate the uncertainty of the annual sulfur balance in a pulp
factory, the standard deviations of the means of several material streams
have to be estimated. The streams have different sizes, contain sulfur in
Again, ni has to be rounded to integers so that the required many different compounds, and in many streams, high variability is
standard deviation of the lot mean will not be exceeded, characteristic to the sulfur containing compounds. This is especially the
i.e., r2x̄¯ Vr2T . case for atmospheric emissions.).
P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94 93
Table 1
Relative sizes, relatives standard deviation of one sample/measurement
period, allocation of 32 samples (measurement periods) that can be made
during one week, and relative standard deviations of the mean of 1 week in
emission group #6
Stratum No Wi sri (%) ni sr(x̄) (%)
1 0.143 30 5 13.4
2 0.285 25 9 8.3
3 0.002 40 1 40
4 0.285 25 9 8.3
5 0.285 25 9 8.3
Total 1.000 32 sr ðx̄¯Þ ¼ 4:5%
Table 2
Relative sizes, total relatives standard deviation between the mean of 1-
week measurement period, and allocation of 50 measurement weeks
between the different emission source groups
Emission group no. Wi (%) sri (%) ni
1 0.14 14.6 10
2 0.18 20.8 17
3 0.04 14.8 3
Fig. 9. Variogram calculated from the heterogeneities of Fig. 8 (upper part) 4 0.02 7.9 1
and relative standard deviation estimates for systematic and stratified 5 0.02 10.9 1
sample selection as function of the sampling interval calculated from the 6 0.60 6.4 18
variogram (lower part). Standard deviation of the annual mean of sulfur emission, sr ðx̄¯Þ ¼ 1:5%
94 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94
Table 2 shows that regardless of the high heterogeneity directly to a paper factory through a pipeline. The pulp
of sulfur in gaseous emissions, the mean can be estimated factory had installed a process analyzer to the pipeline and
fairly accurate at an annual level. When the emission the amount of pulp pumped to the paper factory—and the
measurements were carried out approximately according bill for sold pulp—was based on this measurement. When
to the optimized plan, the unaccounted sulfur in the annual the discrepancy in the mass-balances of paper produced
material balance was only about 1%, which is in a good and pulp received was noticed, the sampling and analytical
agreement with the theoretical calculation considering the chain was carefully checked. It turned out that partly due
complexity of the process. to the sampling problems and partly due to the calibration
problems of the analyzer, the estimated amount of pulp
could be 10% too high. As the paper factory consumed
5. Conclusions about 80 000 tonne pulp per year, the analytical error was
really expensive. By improving the sampling and calibra-
Sampling theory can, and should, be applied in all steps tion procedure, the systematic errors could be removed and
of analytical procedures, from the planning to the analytical the random errors were minimized to a level where their
measurements. At the moment, it is largely neglected. A effect to the mean of the annual mass of pulp pumped was
large amount of analytical resources is devoted for quality only few tenths of a percent.
control and environmental emission estimation in industries.
At national levels, large programs are devoted, e.g., to
monitor the state of environment and to guarantee the Acknowledgements
quality and safety of food. Only seldom that the data quality
objectives are adequately defined before the sampling I want to thank my friends, Dr. Pierre Gy for developing
campaigns, and consequently, the sampling theory is not the important sampling theory and Prof. Kim Esbensen for
utilized when these campaigns are designed. Often, the encouragement for writing this paper. I am also greatly
sampling plans have just evolved and their performance indebted to numerous colleagues and coworkers with whom
has never been critically audited. This leaves a lot of room I have worked with sampling problems over the years.
for optimization in this field.
Before designing a sampling and analytical plan, one
should carefully consider what is the uncertainty level References
tolerated by the user of the results. If the ambition level
is set too high, the investigation will be unnecessarily [1] P.M. Gy, Sampling of Particulate Materials, Theory and Practice,
expensive. In general, in the cost – benefit relationship Elsevier, Amsterdam, 1982.
[2] P.M. Gy, Sampling of Heterogeneous and Dynamic Material Systems,
considerations, the following rule applies: if the total Elsevier, Amsterdam, 1992.
standard deviation is cut to half, the cost will increase [3] P.M. Gy, Sampling for Analytical Purposes, Wiley, Chichester, 1998.
four times, and if only one quarter of original standard [4] F.F. Pitard, Pierre Gy’s Sampling Theory and Sampling Practice,
deviation can be tolerated, the cost will be 16 times higher, Second edition, CRC Press, Boca Raton, 1993.
etc. But, what is important, this relationship only holds if [5] K. Sommer, Probenahme von Pulvern und körnigen Massengütern,
Springer, Berlin, 1979.
the analytical procedure has been optimized. If the resour- [6] C.O. Ingamells, F.F. Pitard, Applied Geochemical Analysis, Wiley,
ces are not optimally allocated, the required uncertainty New York, 1986.
level may not be achieved at all. On the other hand, [7] W.G. Cochran, Sampling Techniques, Third edition, Wiley, New York,
optimization of existing procedures may become cheaper 1977.
and still give more reliable result than the original proce- [8] P. Minkkinen, SAMPEX—a computer program for solving sampling
problems, Chemolab 7 (1989) 189 – 194.
dure. An example: a pulp factory pumped the pulp slurry