Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
26 views10 pages

Applications of Sampling Theory

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views10 pages

Applications of Sampling Theory

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Chemometrics and Intelligent Laboratory Systems 74 (2004) 85 – 94

www.elsevier.com/locate/chemolab

Practical applications of sampling theory


Pentti Minkkinen *
Department of Chemical Technology, Lappeenranta University of Technology, P.O. Box 20, FIN-53851 Lappeenranta, Finland

Received 1 August 2003; received in revised form 1 January 2004; accepted 12 March 2004
Available online 28 July 2004

Abstract

A large number of analyses is carried out, e.g., for process control, product quality control for consumer safety, and environmental control
purposes. The sampling theory developed by Pierre Gy, together with the theory of stratified sampling, can be used to audit and optimize
analytical measurement protocols. A careful optimization of the sampling and measurement steps of the complete analytical procedure may
result in considerable savings in costs or in improvement of the reliability of results.
D 2004 Elsevier B.V. All rights reserved.

Keywords: Gy’s sampling theory; Stratified sampling; Optimization of sampling

1. Introduction sampling theory can help to develop cost-optimal proce-


dures. The optimization procedures described in this paper
In many textbooks of analytical chemistry, it is stated that are based on Sommer’s work.
the result is not better than the sample on which it is based.
Very little is however said on how to assure that the sample
is good. It is still largely unknown that there exists a useful 2. Design and audit of sampling procedures
sampling theory developed for chemical analysis. The
situation is, hopefully, slowly changing. Laboratories and The classification of errors of sampling forms a logical
consultants who are carrying out sampling as part of their framework for designing and auditing sampling procedures.
business have started to accredit their sampling procedures, The classification is shown in Fig. 1 (see Gy’s papers in this
at least in Finland, probably elsewhere too. Basic require- issue for explanations of the different boxes of the figure).
ments for the accreditation are that the sampling equipment Auditing and designing sampling procedures normally in-
is correct, the uncertainties of the methods have been volve the following steps:
estimated, procedures are regularly audited, and the person-
nel have been adequately trained for their jobs. Step 1 Check that all sampling equipment and procedures
The most complete theory on sampling for chemical obey the rules of correct sampling. Replace incorrect
analysis, that takes into account both the technical and equipment and procedures with correct ones.
statistical aspects of sampling, has been developed by Pierre Correct sampling largely eliminates the materializa-
Gy. Gy’s theory is presented in his books [1 –3] and the tion and preparation errors. Weighting error is made
latest developments in papers of this issue. Pitard [4] has if the lot consists of sublots of different sizes or if
also written a book about Gy’s sampling theory. A useful the flow rate varies during the sampling periods in
account covering the theory of stratified sampling and process streams, and simple average is calculated to
optimization of sampling procedures has been written by estimate the lot mean. This error is eliminated if
Sommer [5]. The purpose of this paper is to elucidate how proportional cross-stream sampling can be carried
out, and the average is calculated as the weighted
mean weighted by the sample sizes.
* Tel.: +358-5-621-2102 (office), +358-40-504-9413 (mobile); fax:
Step 2 Estimate the remaining errors (fundamental samp-
+358-5-621-2199. ling error, grouping and segregation error, and point
E-mail address: [email protected] (P. Minkkinen). selection error) and what is their dependence on

0169-7439/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.chemolab.2004.03.013
86 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94

one-dimensional object, fundamental sampling error mod-


els can be used to estimate the uncertainty of the sampling.
If the lot cannot be treated as one-dimensional object, at
least the point selection error has to be taken into account
when the variance of primary samples is estimated. If the
sample preparation and size reduction by splitting are
carried out correctly, fundamental sampling error models
can be used for estimating the variance components
generated by these steps. If the expectance value for the
number of critical particles, in the sample can be estimated
easily as function of sample size, Poisson distribution or
binomial distribution can be used as sampling models to
estimate the uncertainty of the sample. In most cases, the
fundamental sampling error model developed by Gy has to
be used.

3.1. Estimation of fundamental sampling error by using


Poisson distribution

Poisson distribution describes the random distribution of


rare events in a given time or space interval. If the average
number of the critical particles expected in the sample can
be estimated, the standard deviation of the sample can be
estimated. Poisson distribution, as the model for sampling
Fig. 1. Gy’s classification of sampling errors according to the origin of
error, has been treated, e.g., by Ingamells and Pitard [6]. The
errors. Errors can be classified into two main groups, to those that originate
from incorrect design or operation of the sampling equipment (material- important property of the Poisson distribution is that the
ization errors) and to statistical errors. variance and the mean of the occurrences or the events in
the interval inspected are identical (ln, average number of
critical particles in the sample in our case). Standard
increment size and sampling frequency. If the deviation expressed as the number of particles is
necessary data are not available, design pilot studies
to obtain the data. pffiffiffiffiffi
rn ¼ ln ð1Þ
Step 3 Define the acceptable overall uncertainty level or
cost of the investigation and optimize the method, The relative standard deviation is just as easy to estimate
i.e., the increment sizes, selection strategy (system-
atic or stratified), and the sampling frequency so 1
that the required uncertainty or cost level is rr ¼ pffiffiffiffiffi ð2Þ
ln
achieved.
If ln is large (say, larger than 25), the confidence interval
Step 1 is crucial. Normally it is difficult and expensive to can be estimated by replacing Poisson distribution by
estimate the uncertainties of incorrect sampling. It is also normal distribution with the same standard deviation and
futile, because sampling biases are never constant due to the mean. If ln is small, the confidence intervals have to be
fact that stream segregation is a transient phenomenon estimated from Poisson distribution. Example 1 describes a
which changes all the time. Therefore, sampling correctness typical situation where Poisson distribution can be used as
must be preventively implemented. the model for sampling error estimation.

Example 1.
3. Applications of fundamental sampling error model Plant Manager: I am producing fine-ground limestone that
is used in paper mills for coating printing paper. According
Fundamental sampling error is the minimum error of an to their specification, my product must not contain more
ideal sampling procedure. Ultimately, it depends on the than 5 particles/tonne particles larger than 5 Am. How
number of critical particles in the samples. For homoge- should I sample my product?
neous gases and liquids, it is very small, but for solids, Sampling Expert: That is a bit too general a question. Let’s
powders, and particulate materials, especially at low con- first define our goal. Would 20% relative standard deviation
centrations of critical particles, the fundamental error can for the coarse particles be sufficient?
be very large. If the lot to be sampled can be treated as Plant Manager: Yes.
P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94 87

Sampling Expert: Well, let’s consider the problem. We


could use the Poisson distribution to estimate the required
sample size. Let’s see:
The maximum relative standard deviation sr = 20% = 0.2.
From Eq. (2), we can estimate how many coarse particles
there should be in the sample to have this standard deviation
Fig. 2. Estimation of particle shape factor and liberation factor for
1 1 unliberated and liberated critical particles. L is the particle size of the
n¼ 2 ¼ ¼ 25
sr 0:22 critical particles.

If 1 tonne contains 5 coarse particles, this result means that


characteristic dimension d to the volume of the cube
the primary sample should be 5 tonne. This is a good
having the same dimension. For spheroidal particles
example of an impossible sampling problem. Although you
f c 0.5, which often can be used as the default value for
could take a 5-tonne sample, there is no feasible technology
this parameter. g is the size distribution factor ( g = 0.25 for
to separate and count the coarse particles from it. You should
wide-size distribution, and g = 1 for uniform particle sizes),
not try the traditional analytical approach in controlling the
and b is the liberation factor (see Fig. 2). Liberation factor
quality of your product. Instead, if the specification is really
is an empirical correction for materials, where the critical
sensible, you forget the particle size analyzers and maintain
particles are found as inclusions in the matrix particles.
the quality of your product by process technological means;
Liberation size L is defined as the size of the opening of a
that is, you take care that all the equipment are regularly
screen below which 95% of the material has to be crushed
serviced and their high performance maintained so that the
in order to liberate at least 85% of the critical particles;
product quality is always maintained.
bmax = 1 (liberated materials and materials ground below
Plant Manager: Thank you. In light of what you said, it
the liberation size L), bmin = 0.03 (materials where the
seems that the expensive laser diffraction particle size
critical particles are very small in comparison to d; note
analyzer recommended to us will not solve our problem.
that because b is dependent on the particle size d for a
given material, the sampling constant C changes when the
3.2. Applications of Gy’s fundamental sampling error
material is ground or crushed). c is the constitution factor
equation for designing sample preparation procedures
and can be estimated by using Eq. (6) if the necessary
material properties are available.
If the material to be sampled consists of particles
having different shapes and size distributions, it is difficult aL 2
to estimate the number of critical particles in the sample. 1  aL 
c¼ a q þ 1  q ð6Þ
Gy has derived an equation that can also be used in this aL c
a m
case to estimate the relative variance of the fundamental a
sampling error:
Here, aL is the average concentration of the lot; a, the
  concentration of the analyte in the critical particles; qc, the
1 1
r2r ¼ Cd 3  ð3Þ density of the critical particles; and qm, the density of the
MS ML matrix or diluent particles.
If the material properties are not available and they are
Here
difficult to estimate, the sampling constant C can always
be estimated experimentally. International reference mate-
ra
rr ¼ ¼ Relative standard deviation of the fundamental rials (RMs), for example, are a special group of materials
aL
for which the sampling constant should always be esti-
sampling error ð4Þ mated and reported. Unfortunately, this is seldom done. If
the particle size distribution and sampling constants were
where ra, absolute standard deviation (in concentration available for the user, the usefulness of the materials
units); aL, average concentration of the lot; d, characteristic could be improved. The producers of the RMs usually
particle size = 95% limit of the size distribution; MS, carry out homogeneity tests that provide data that could
sample size; ML, lot size; and C is the sampling constant be reported in a compressed form as sampling constants,
that depends on the properties of the material sampled. but unfortunately at the moment, these data are not fully
C is the product of four parameters: utilized.
Below, some examples are given on how Gy’s funda-
C ¼ fgbc ð5Þ mental sampling error model can be used in practice to
design and audit analytical procedures. Some further
where f is the shape factor (see Fig. 2). Shape factor is the examples can be found in Ref. [8]. As mentioned above,
ratio of the volume of the sampled particles having the the fundamental sampling error is the minimum theoretical
88 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94

error achievable in a sampling step. Therefore, the funda- The total relative standard deviation can now be
mental sampling error calculations give realistic estimates estimated by applying the rule of propagation of errors:
for the global sampling error, only if the material is well-
qffiffiffiffiffiffiffiffiffiffiffiffi
X
mixed before sampling, and all sampling and subsampling
st ¼ s2ri ¼ 0:143 ¼ 14:3%
procedures are carried out with equipment and methods
that follow the rules of sampling correctness defined in
The largest error is generated in preparing the 2-g
Gy’s sampling theory. Therefore, if large lots are sampled,
sample for the extraction of the enzyme. To improve the
the uncertainty of the primary samples has to be estimated
overall precision, this step should first be modified. The
in a different way, e.g., by using Gy’s variographic
recommendation from this exercise is that either a larger
method.
sample should be used for the extraction, or the primary
sample should be pulverized to a finer particle size before
Example 2. A certain cattle feed (density = 0.67 g/cm3)
secondary sampling, whichever is more economic in
contains as an average 0.05% of an enzyme powder that has
practice.
a density of 1.08 g/cm3. The size distribution of the enzyme
was available, and from it was estimated that the
Example 3. Evaluate the feasibility of the following
characteristic particle size d = 1.00 mm and the size factor
procedure for calibrating an IR spectrometer for the
g = 0.5. Estimate the fundamental sampling error for the
determination of quartz in mineral mixtures. To prepare
following analytical procedure.
the calibration standards, pure minerals (d = 1 mm) were
The actual concentration of a 25-kg bag is estimated by
ground individually for 2 min in a swing mill. Then 30
taking first a 500-g sample from it. This material is ground
mg –2.95 g of each mineral was carefully weighed to
to particle size  0.5 mm. Then, the enzyme is extracted
obtain the designed composition. The material was
from a 2-g sample by using a proper solvent, and the
carefully mixed for 3 min in a Retsch Spectro Mill, and
concentration is determined by using liquid chromatogra-
20 mg of the mineral mixture was carefully weighed into
phy. The relative standard deviation of the chromatographic
4.98 g of KBr and mixed for 3 min in a Retsch Spectro
measurement is 5%.
Mill. Mineral – KBr mixture (200 mg) was pressed into a
To estimate the errors of the two sampling steps, we have
tablet for the IR measurement. It was evaluated that the
the following material properties:
size of the IR beam covered 38% of the area of the sample
tablet. The method was developed for quartz concen-
trations from 1% to 10%.
M1 = 500 g Dilution factor in 0.2 g/5.0 g = 0.004 is needed to
ML1 = 25000 g evaluate aL, the concentration of quartz in KBr tablets.
d1 = 0.1 cm
g1 = 0.5
The procedure has three steps generating sampling errors.
aL = 0.05% These are
a = 100%
qc = 1.08 g/cm3 (1) taking the 20-mg mineral sample from the homogenized
qm = 0.67 g/cm3 mineral mixture to be mixed in KBr:
f = 0.5 . . .default value for spheroidal particles
b=1 . . .liberated particles
lot size = ML1 = 5 g
M2 = 2.0 g . . .sample sizes sample size = MS1 = 0.02 g
ML2 = 500 g . . .lot sizes (2) the calibration tablet preparation:
d2 = 0.05 cm . . .particle sizes lot size = ML2 = 5 g
g2 = 0.25 . . .estimated size distribution factors sample size = MS2 = 0.2 g
(3) IR measurement:
lot size = ML3 = 200 mg
These values give for constitution factor (Eq. (6)) the value sample size = MS3 = 38% of 0.2 g = 76 mg
c = 2160 g/cm3, and for the sampling constants (Eq. (5)), C
values Following material properties were estimated:

C1 ¼ 540 g=cm3 and C2 ¼ 270 g=cm3


d = 0.045 mm . . .particle size of quartz
Eq. (3) gives now the standard deviation estimates for g1 = 0.25 . . .estimated size distribution factor
sampling steps. aL = 1.0 – 10% . . .concentration of quartz mineral
a = 100% mixture before dilution with KBr
qc = 2.65 g/cm3
qm = 3.0 g/cm3 (in mineral mixture)
sr1 = 0.033 = 3.3% . . .primary sample qm = 3.2 g/cm3 (in KBr mixture)
sr2 = 0.13 = 13% . . .secondary sample f = 0.5
sr3 = 0.05 = 5% . . .analytical determination b=1 . . .liberated particles
P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94 89

primary samples are taken. r1 is the standard deviation


between the means of the N1 strata, and c1 is the unit
cost of selecting a strata for sampling (usually c1 is
practically zero, because it only involves making the
decision from which strata the samples should be
taken).

4.1.2. Primary samples


From each selected stratum, n2 primary samples are
taken. N2 is the size of stratum expressed as the number
of potential samples that could be taken from the each
stratum. r2 is the standard deviation of primary samples (or
the within-strata standard deviation), and c2 is the unit cost
of taking a primary sample.

Fig. 3. Standard deviation estimates obtained in Example 3 for the three 4.1.3. Analytical samples
sampling steps (1 – 3) and for the total standard deviation (4) in the At this level, n3 is the number of analytical samples
calibration of the IR spectrometer for quartz determination. prepared from each primary sample, N3 is the size of the
primary sample, as the number of potential analytical
samples that could be prepared from it, r3 is the standard
Results of the fundamental sampling error estimation deviation of the preparation of the analytical samples (i.e.,
are shown as function of the quartz in the original the standard deviation between analytical samples taken
sample mixture without dilution with KBr in Fig. 3, for from a primary sample), and c3 is the unit cost of preparing
each step separately and for the total three-step calibration an analytical sample.
procedure. For optimization purposes, the unit costs, ci, can be
given either as currency units or as relative costs, e.g., as
time required to carry out the given sampling operation.
4. Optimization of sampling plans based on stratified Because the strata and the units within the strata are in
sampling most cases autocorrelated, especially in process analysis,
and the sampling variances depend on sampling strategy
Theory of stratified sampling can be used to optimize (systematic, stratified, or random selection), the variances
sampling plans. This subject has been treated, e.g., by should in general be estimated by using Gy’s vario-
Sommer [5] and Cochran [7]. Sommer’s approach is graphic method. Analysis of variance based on the
followed in this presentation. Sometimes, stratification is design shown in Fig. 4 is recommended in many
natural, e.g., if the lot to be investigated consists of bags, statistical textbooks as the method for estimating the
containers, wagon loads, etc. In sampling process streams, variance components. Because this method does not take
where no clear stratum borders can be found, the strata autocorrelation into account, it should be used only in
can be selected by the sampler. As Gy and Sommer [5] case that strictly random sample selection is used (not
have shown, stratified sampling usually gives smaller recommended), or there is no autocorrelation between the
uncertainties for the mean value, at worst equal to random sampling units.
sampling.

4.1. Optimization of nested (hierarchical) sampling plans


for lots consisting of strata of equal sizes

Nested sampling plan is described in Fig. 4. In nested


sampling, the samples are taken at k levels (here k = 3). All
levels contribute to the overall uncertainty of the mean of
the lot, and at each level below the first sampling level, the
sample of the upper level is treated as the lot for this level in
the sampling chain. The quantities shown in Fig. 4 are
discussed in the following subsections.

4.1.1. Lot
The lot consists of N1 strata (sublots) of equal
sizes. Of these, n1 strata are selected from which the Fig. 4. Nested sampling plan.
90 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94

Because the strata have equal sizes, the mean of the lot By substituting the values for ni (i>1) in Eq. (9), n1 can
can be calculated as the unweighted mean of the analytical now be solved
results
cmax
n1 ¼ ; rounded to the lower integer
c 1 þ n2 c 2 þ n2 n3 c 3
Total number of samples analyzed : nt ¼ n1 n2 n3 ð6Þ
1Vn1 VN1 ð11Þ

1 X n1 X
n2 X
n3 4.1.3.2. Target value for the variance of the mean fixed,
Mean of the lot : x̄ ¼ xijk ð7Þ total cost minimized. If rT2 is the target value of the
n1 n2 n3 i j k
variance of the mean of the procedure, then the protocol
has to be designed so that the variance of the mean of the
Variance of the lot mean: selected procedure, r2x̄ V rT2.
For levels i >1, Eq. (10) also applies in this case. Number
N1  n1 r21 N2  n2 r22 N3  n3 r23 of sublots to be sampled at level 1 can now be solved either
r2x̄ ¼ þ þ from Eq. (8a) (exact solution, Eq. (12a)) or from Eq. (8b)
N 1  1 n1 N2  1 n1 n2 N3  1 n1 n2 n3
(approximate solution, Eq. (12b), when all NiHni)
ð8aÞ
N1 N2  n2 r22 N3  n3 r23
Eq. (8a) shows that if a sample can be taken from r21 þ þ
N 1 N2  1 n2 N3  1 n2 n3
every stratum (N1 = n1), the between-strata variance is n1 ¼ 1 2
ð12aÞ
r1
completely eliminated from the variance of the mean. r2T þ
On the other hand, if at all levels, the sample is small N1  1
in comparison to the lot, from which it is taken, this r21 þ r22 þ r23
equation simplifies to n1 ¼ ð12bÞ
r2T

r21 r2 r23 These solutions have to be rounded to the nearest upper


r2x̄ ¼ þ 2 þ if all ni bNi ð8bÞ
n1 n1 n2 n1 n2 n3 integer 1 V n1 V N1.

Example 4 (Determination of cobalt in nickel cathodes).


Total cost of the investigation : ct ¼ n1 c1 þ n1 n2 c2 Assume that the size of a lot consisting of cathode nickel
þ n1 n2 n3 c 3 ð9Þ is 25 tonne and according to the specifications, the cobalt
content must not exceed 150 g/tonne. Average weight of
This system can be optimized in two ways: either so the cathode plates produced is 50 kg, and the plates are cut
that the maximum tolerable variance is first specified and approximately into 50-g pieces before packing into barrels
the total cost has to be minimized, or the total cost is for shipment. For the analysis, these 50-g pieces are taken
fixed and the variance of the mean has to be minimized. as primary samples, and from them a 1-g sample is
An exact mathematical solution for this optimization dissolved for the cobalt determination. The cost of one
problem cannot be derived, because the number of sam- analytical determination is 12 o, that of taking one
ples taken can only be an integer number. The optimum primary sample (50-g piece from a given plate) is 2 o.
can be found however, either by checking all feasible The standard deviations have been estimated as between-
solutions, which is relatively easy by the speed of modern plates standard deviation s1 = 35 g/tonne, within-plate
computers, or by using approximate mathematical solution standard deviation (standard deviation of 50-g pieces taken
given by Sommer [5]. Approximate mathematical solution from a single plate) s2 = 15 g/tonne, and the standard
can be derived by assuming that all ni are continuous deviation between 1-g samples taken from a single 50-g
instead of integers and all NiHni. The mathematical piece, s3 = 3.3 g/tonne. Optimize the analytical procedure
solution is presented below. so that the standard deviation of the lot mean does not
exceed the value of 5 g/tonne.. Sampling is carried out at
4.1.3.1. Maximum costs, cmax fixed, variance of the mean three different error-generating levels. Following values
minimized. For the levels below, the first level the number apply to these:
of samples are to be taken can be evaluated by using the
formula level 1: N1 = 25000 kg/50 kg = 500, s1 = 35 g/tonne,
c1 = 0 o, n1=?
rffiffiffiffiffiffiffiffi level 2: N2 = 50 kg/50 g = 1000, s2 = 15 g/tonne, c2 =
sici1
ni ¼ ; constrained to integers 1Vni VNi ; 2 o, n2=?
si1 ci level 3: N3 = 50 g/1 g = 50, s3 = 3.3 g/tonne, c3 = 12 o,
for levels i > 1 ð10Þ n3=?
P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94 91

At level 1, the unit cost is practically zero. At this


level, the only sampling procedure is the decision made
from which plates the 50-g primary samples should be
taken.

Solution. Eq. 10 gives the results: n3 = 0.09, by apply-


ing the constraints we have to select n3 = 1; n2 = 0, we have
to select n2 = 1.
For the level 1, the following results are obtained: n1=
53.3 (Eq. (12a)) or n1= 58.4 (Eq. (12b)). The approximate
solution slightly overestimates the required number of
plates to be sampled, but as long as the size of the sample
Fig. 6. Lot consisting of k strata of different sizes, and the quantities needed
at any level is, say 10% or less from the lot size, the to optimize the sampling plan.
difference is small.
In this case, the following sampling protocol could be The lot consists of k different strata. Quantities needed to
used. design a cost optimal sampling plan are
A 50-g piece is taken from every ninth cathode plate
at the packing stage. This gives a total of 55 samples MLi
Wi ¼ P ¼ Relative size of the stratum i ð14Þ
per a lot of 25 tonne. From each 50-g piece, one cobalt MLi
determination is made. The total cost of this inspection
scheme is 55(0 o + 2 o + 12 o) = 770 o. The ex- where MLi, sizes of strata (e.g., as mass or volume; i = 1,2,
pected standard deviation of the lot mean (from Eq. (8a)) . . ., k); Ni, relative size of stratum i expressed as the
is 4.9 g/tonne. number of potential samples that could be taken from the
Fig. 5 shows the Operation Characteristics curve of this strata = Ni = MLi/MSi; MSi is the size of samples taken from
inspection scheme. It shows both the producer’s and buyer’s stratum i; ri, standard deviation of one sample taken from
risk by using this inspection scheme. stratum i; ci, cost of one sample analyzed from stratum i;
ct, total cost of the estimation of the grand mean of the lot;
4.2. Optimization of sampling plan, when the lot consists of ni, number of samples taken and analyzed from stratum i; nt,
strata of different sizes and heterogeneities total number of samples analyzed = Sni; xij, analytical results
on samples
P ni from stratum i (i = 1,2,. . .,k; j = 1,2, . . ., ni);
Sometimes the lot to be investigated consists of well- x
j ij
P
x̄i ¼ ni = Mean of stratum i; x̄¯ ¼ ki¼1 Wi x̄i = grand mean
defined strata of different sizes and heterogeneities; for of the lot.
example, when the batch to be processed is prepared by Variance of the lot mean
mixing raw materials of different qualities to achieve the
required average composition. This kind of a lot is shown in X Ni  ni r2i
Fig. 6. r2x̄¯ ¼ Wi2 ð15aÞ
Ni  1 ni

If the samples taken are small in comparison to the stratum


size (as is usually the case), this equation simplifies to
X r2i
r2x̄¯ c Wi2 ; if in all strata ni bNi and Ni H1 ð15bÞ
ni
Total cost of the investigation in general case is

X
k
ct ¼ ni c i ð16aÞ
i¼1

Usually in practice, the costs of sampling and analysis are


independent from the strata from which the samples are
taken and this equation simplifies to

ct ¼ nt c*; if c1 ¼ c2 ¼; . . . ; ¼ ck ¼ c* ð16bÞ
Fig. 5. Operation Characteristics of the optimized inspection protocol for
cobalt determination. In this figure x-axis shows the true mean value of the
lot, and y-axis, the probability that the mean value of the inspection exceeds
Optimization of investigation involves the optimal allo-
the specification 150 g/tonne (producer’s risk) or the probability that the cation of the total number of samples that can be analyzed
inspection value is below the specification (buyer’s risk). between the strata in an optimal way. Mathematical opti-
92 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94

mum can again be derived by assuming that ni is continu- Example 5 (Optimal design for estimating sulfur balance of
ous. The results given below have been derived assuming a pulp mill). Estimation of sulfur balance for a pulp mill is a
that investigation costs are independent from strata (Eq. difficult task Fig. 7. Sulfur enters the mill in raw materials
(16b) valid). Optimization strategy depends on how much (water, wood) and in chemicals. The outflowing streams
information is available. If only the sizes of strata are known consist of products, wastewater, solid wastes, and atmo-
and the total cost ct of the investigation is fixed, then the spheric emissions. Initial calculations showed that the mean
best strategy is to allocate the samples proportionally to the values of all other streams, except the emissions into
sizes of strata: atmosphere, could be estimated reliably. Atmospheric
emissions comprised about one quarter of the total sulfur.
ni ¼ W i nt ð17Þ Estimation of atmospheric sulfur emissions in an old pulp
mill, like the one where this study was carried out, is
and difficult and therefore optimization of the emission
ct measurement plan is a challenging task. This is due to the
nt ¼ ð18Þ fact that there is a large number of gaseous outlets into
c*
atmosphere. These have different mass-flows, concentra-
Both nt and ni have to be rounded to integers so that the total tions of the sulfur compounds are highly variable and sulfur
cost will not be exceeded. is found in dust and in many gaseous compounds [SO2,
If the unit costs and standard deviations are available, H2S, CH3S, (CH3)2S, and (CH3)2S2]. In optimization, all
then even better plans can be designed. Laboratories usually different emission sources and sulfur-containing compounds
follow their costs and, consequently, good cost estimates are analyzed had to be treated as separate strata.
available. The standard deviations can be estimated either
by using Gy’s sampling theory, if the material properties
needed are available, or experimentally from a pilot study. If 4.2.2.1. Design requirements.
the quality control of the analytical laboratory is well
 Material balance for sulfur required annually
planned, it also provides data that can be used for optimi-
 A two-man team with portable instruments available for
zation of sampling and analytical procedures.
Cost optimal plan for the investigation can be derived in atmospheric emission control measurements
 To avoid too much time spent in transporting and setting
two ways. Either the total cost of the investigation is fixed
and the variance of the grand mean of the lot is minimized up the instruments, a one-week measurement period is
or the target value for the variance is given and the total cost carried out at each location, where the team is working;
is minimized. By assuming that Eqs. (15b) and (16b) are that is, 52 periods/year are available and should be
valid, that is, in all strata the samples are small in compar- allocated optimally
 For optimization, the emission sources were grouped into
ison to the sizes of strata, and the cost of investigation is
independent of strata, the following results can be derived. six groups, which could be measured during 1 week from
one station in the field
4.2.1. Maximum value, cmax, given to the total cost,
variance of the lot mean minimized 4.2.2.2. Acquisition of basic data. Existing records sup-
plemented with new experiments were used to estimate:

Wi ri cmax
ni ¼ ð19Þ
X
k c*
Wi r i
i¼1

Here, ni has to be rounded to integers so that the target cost


is not exceeded.

4.2.2. Target value, rt, given to the standard deviation of the


lot mean, total cost minimized

Wi ri X
k
ni ¼ Wi ri ð20Þ
r2T i¼1 Fig. 7. To estimate the uncertainty of the annual sulfur balance in a pulp
factory, the standard deviations of the means of several material streams
have to be estimated. The streams have different sizes, contain sulfur in
Again, ni has to be rounded to integers so that the required many different compounds, and in many streams, high variability is
standard deviation of the lot mean will not be exceeded, characteristic to the sulfur containing compounds. This is especially the
i.e., r2x̄¯ Vr2T . case for atmospheric emissions.).
P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94 93

Table 1
Relative sizes, relatives standard deviation of one sample/measurement
period, allocation of 32 samples (measurement periods) that can be made
during one week, and relative standard deviations of the mean of 1 week in
emission group #6
Stratum No Wi sri (%) ni sr(x̄) (%)
1 0.143 30 5 13.4
2 0.285 25 9 8.3
3 0.002 40 1 40
4 0.285 25 9 8.3
5 0.285 25 9 8.3
Total 1.000 32 sr ðx̄¯Þ ¼ 4:5%

sample selection. The variogram and standard deviation


estimates were calculated by using Gy’s method. With the
standard deviation estimates, the uncertainty of the
process mean during the measurement period can be
Fig. 8. Heterogeneity (relative variation about the mean) of SO2 in estimated. This measurement plan can be treated as a two-
emissions from a soda recovery boiler during a measurement period. level hierarchical sampling plan, where one sample is
analyzed from each of the N1 strata. By substituting
 Relative contribution (size) of all major emission sources N1 = n1, n2 = 1 (and N2Hn2) and N3 = n3 = 0 in Eq. (8a),
to the total emission of sulfur compounds the standard deviation of the mean of the measurement
 Variance estimates for the different emission sources (e.g., period is obtained as
from variogram or from Analysis of Variance). As an
example, Fig. 8 shows the SO2 emissions from the soda N1  n1 r21 N2  n2 r22 s2
r2x̄ ¼ þ c 2;
recovery boiler as the process heterogeneity (relative N1  1 n1 N 2  1 n1 n2 n1
variation about the process mean). As can be seen, a high
variability is characteristic to emissions from this source. where s22 is the standard deviation estimate from Fig. 9 for
In the peaks, the SO2 concentration is three times higher the used sampling frequency, and n1 is the number of
than the process average. From the heterogeneity values, samples analyzed.
variogram (Fig. 9) can be calculated for the process. Fig. 9
also shows the relative standard deviation estimates for 4.2.2.3. Optimization procedure. Optimization was car-
the sampling error in this process stream as function of the ried out at two levels:
sampling interval both for the systematic and the stratified
(1) For each group having more than one source, the
analytical resources were optimally allocated within the
one-week (5 days) measurement period between the
different sources (an example is given in Table 1).
(2) Of the 52 measurement weeks, two were allocated for
the insignificant sources, where only occasional
measurements need to be carried out, and 50 weeks
were allocated to the six most important groups. The
results of the final optimization are given in Table 2.

Table 2
Relative sizes, total relatives standard deviation between the mean of 1-
week measurement period, and allocation of 50 measurement weeks
between the different emission source groups
Emission group no. Wi (%) sri (%) ni
1 0.14 14.6 10
2 0.18 20.8 17
3 0.04 14.8 3
Fig. 9. Variogram calculated from the heterogeneities of Fig. 8 (upper part) 4 0.02 7.9 1
and relative standard deviation estimates for systematic and stratified 5 0.02 10.9 1
sample selection as function of the sampling interval calculated from the 6 0.60 6.4 18
variogram (lower part). Standard deviation of the annual mean of sulfur emission, sr ðx̄¯Þ ¼ 1:5%
94 P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 85–94

Table 2 shows that regardless of the high heterogeneity directly to a paper factory through a pipeline. The pulp
of sulfur in gaseous emissions, the mean can be estimated factory had installed a process analyzer to the pipeline and
fairly accurate at an annual level. When the emission the amount of pulp pumped to the paper factory—and the
measurements were carried out approximately according bill for sold pulp—was based on this measurement. When
to the optimized plan, the unaccounted sulfur in the annual the discrepancy in the mass-balances of paper produced
material balance was only about 1%, which is in a good and pulp received was noticed, the sampling and analytical
agreement with the theoretical calculation considering the chain was carefully checked. It turned out that partly due
complexity of the process. to the sampling problems and partly due to the calibration
problems of the analyzer, the estimated amount of pulp
could be 10% too high. As the paper factory consumed
5. Conclusions about 80 000 tonne pulp per year, the analytical error was
really expensive. By improving the sampling and calibra-
Sampling theory can, and should, be applied in all steps tion procedure, the systematic errors could be removed and
of analytical procedures, from the planning to the analytical the random errors were minimized to a level where their
measurements. At the moment, it is largely neglected. A effect to the mean of the annual mass of pulp pumped was
large amount of analytical resources is devoted for quality only few tenths of a percent.
control and environmental emission estimation in industries.
At national levels, large programs are devoted, e.g., to
monitor the state of environment and to guarantee the Acknowledgements
quality and safety of food. Only seldom that the data quality
objectives are adequately defined before the sampling I want to thank my friends, Dr. Pierre Gy for developing
campaigns, and consequently, the sampling theory is not the important sampling theory and Prof. Kim Esbensen for
utilized when these campaigns are designed. Often, the encouragement for writing this paper. I am also greatly
sampling plans have just evolved and their performance indebted to numerous colleagues and coworkers with whom
has never been critically audited. This leaves a lot of room I have worked with sampling problems over the years.
for optimization in this field.
Before designing a sampling and analytical plan, one
should carefully consider what is the uncertainty level References
tolerated by the user of the results. If the ambition level
is set too high, the investigation will be unnecessarily [1] P.M. Gy, Sampling of Particulate Materials, Theory and Practice,
expensive. In general, in the cost – benefit relationship Elsevier, Amsterdam, 1982.
[2] P.M. Gy, Sampling of Heterogeneous and Dynamic Material Systems,
considerations, the following rule applies: if the total Elsevier, Amsterdam, 1992.
standard deviation is cut to half, the cost will increase [3] P.M. Gy, Sampling for Analytical Purposes, Wiley, Chichester, 1998.
four times, and if only one quarter of original standard [4] F.F. Pitard, Pierre Gy’s Sampling Theory and Sampling Practice,
deviation can be tolerated, the cost will be 16 times higher, Second edition, CRC Press, Boca Raton, 1993.
etc. But, what is important, this relationship only holds if [5] K. Sommer, Probenahme von Pulvern und körnigen Massengütern,
Springer, Berlin, 1979.
the analytical procedure has been optimized. If the resour- [6] C.O. Ingamells, F.F. Pitard, Applied Geochemical Analysis, Wiley,
ces are not optimally allocated, the required uncertainty New York, 1986.
level may not be achieved at all. On the other hand, [7] W.G. Cochran, Sampling Techniques, Third edition, Wiley, New York,
optimization of existing procedures may become cheaper 1977.
and still give more reliable result than the original proce- [8] P. Minkkinen, SAMPEX—a computer program for solving sampling
problems, Chemolab 7 (1989) 189 – 194.
dure. An example: a pulp factory pumped the pulp slurry

You might also like