Chapter Three
DATA HANDLING AND SPREADSHEETS
IN ANALYTICAL CHEMISTRY
"Facts are subborn, but steises are much more pliable.”
—Matk Twain
"43.89% ofall saistcs are. wordless.”
Anonymous
Although data handling normally follows the collection of data in an analysis, i
4s treated early in the text because a knowledge of statistical analysis willbe te-
‘quired as you perform experiments inthe laboratory Alo, statistics are necessary
{o undersiand the significance of the da tat are collected and therefore to set
limitations on each step of the analysis. The design of experiments (including size
of sample required, accuracy of measurements required, and numberof analyses
eeded) is determined from a proper understanding of what the data will represent
‘The availablity of spreashéets to process data has made statistical and
other calculaions very efficent. You will st be presented with the details of var-
ious calculations throughout the text, which are necessary for fall understanding
ofthe principles. But spreadsheet calculations wil also be iniroduced throughout
toillusrate how to take advantage of this software for routine caleulations. We wil
Introduce te principles of the use of spreadsheets in this chapter
31 Accuracy and Precision: There Is a Difference
Accuracy isthe degree of agreement between the measured value and the true
‘value. An absolute true valu is seldom known. A more realistic definition of 2c-
curacy then, would assume itt be the agreement between a measured value and
the accepted trie value.
‘We can, by good analytical technique, such es making comparisons against
‘known standard sample of similar composition, arive ata reasonable assump-
tion about the accuracy of a method, within the limitations of the knowledge of
Acenroe is how close you gett the
bullseye Pretson i ow close the
repeiive shots are one anther It
|e nearly imposible to have aces
racy without good pression.Fig. 31. Accuracy veews precision.
(Good precision doesnot gurance
accuracy
“To be sae of iting the target,
sont fit, and cll whatever you bit
the trget"—Ashieih Bent
Determine or systematic errs ae
nonrandom snd coor wen some
thing is wong wit he measurement
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
©€
the “known” sample (and ofthe measurements), The accuracy to which we know
the value of the standard sample is ulimately dependent on some measurement
that will have a given Kimitof eertainy in i
‘Precision is defined as the desree of agreement between replicate measure-
ments ofthe same quantity. That i, tis the repeatability of a result. The preci-
sion may be expressed as the standard deviation, the coefficient of variation, the
range of the data, or as a confidence interval e.g. 95%) about the mean value.
Good precision does not assure good accuracy. This would be the case, for ex-
sample, if there were a systematic eror inthe analysis, A weight used to measure
cach ofthe samples may be in error. This error does not affect the precision, but
itdoes affect the accuracy. On the other hand, the precision can be relatively poor
and the accuracy, more or less by chance, may be good. Since all real analyses
are unknown, the higher the degree of precision, the greater the chance of ob-
taining the tue val, It 8 fruitless to hope that a value is accurate ifthe proci-
sion is poor, andthe analytical chemist strives for repeatable reslts to assure the
highest possible accuracy,
"These concepts canbe lustre with a target, a in Figure 3.1. Suppose you
se at target practice snd you shoot the series of ballets that al and in the balls
eye (left target. You ae both precise and accurate, In dhe mide target, you ate
precise (steady hand and eye), but inaccurate, Perhaps the sight on your gun is out
of alignment. In the righ target you ae imprecise and therefore probably inaceu-
rate, So we see that good precision is needed for good accuracy, but it does not
guarantee it
‘As we shall se later, the more measurements that are mde, the more rl
ble willbe the measure of precision. The number of measurements required will
depend on the accuracy equied and on the known reproducibility ofthe metbod.
‘42 Determinate Errors—They fre Systematic
“Two main clases of erors can affect the acuracy or pression of « measured quan-
tty. Determinate errors are those that, as the name implies, are determinable and
that presumably canbe either avoided or corecied. They may be constant, ain the
case of an uncalibrated weight that i ued in all Weighings. Or, they may be vri-
able bot of such a nature that they canbe accounted for and corected, such as a
bret whose volume readings are in exor by diferent amounts at different volumes.
‘The error can be proportional to sample size or may change in a more com
plex manner. More often than not, the variation is unidirectional, asin the case
of solubility loss of a precipitate (aegative error. It can, however, be random in
sign, Such an example isthe change in solution volume and concentration oc-
‘curring with changes in temperature. This can be corrected for by measuring the
solution temperature. Such measurable determinate erors are classed as system=
atic errorsFg
133 INDETERMINATE ERRORS—THEY ARE RANDOM
‘Some common determinate errors are:
1 Instrumental errors, Tse inlade faulty equipment, uncalibrated weights,
fd uncalibrated glasware.
2 operate er. Tse nde penal rot and cn be ect by
iperencs ond cae ofthe aly in he physical manipulations volved
efferescence and “bumping drig simple dissluon, incomplete dry.
ing of samples, and sooo. These ae dificult conect for. Or per
sonal ror ince mathematical errs i calelations and prejudice in
=, where N's the number of measurements. In practice,
‘we must calculate the individual deviations from the mean of limited number of
‘measurements, ¥, in which i is anticipated thar x > although we have no as-
‘surance this will beso; ¥ is given by 2).337. STANDARD DEVIATIONTHE MOST IMPORTANT STATISTIC
7s
Foca set of W measurements, itis posible to ealculate N independently varie
able deviations from some reference number. But if ie reference number chosen
isthe estimated mean, x, the sum of the individual deviations (retaining signs) must
necessarily add up to 720, and so values of 1” ~ 1 deviations are adequate to de-
fine te Nth value. That is, there ae only N ~ 1 independent deviations from the
‘moan; when IV~ 1 values have been selected, the lst is predetermined. We have,
in fect, used one degree of freedom of the dat in calculating the mean, leaving
N~ 1 degrees of freedom for ealeulatng the precision
‘As a result, the estimated standard deviations of a finite set of experi-
‘mental data (generally N'<30) more nearly approximates @ ifthe number of
‘degrees of freedom is substituted for WV (W — 1 adjusts forthe diflerence between
and p)
Se F
aoa 2)
‘The value of sis only an estimate of o, then, and will more neatly approach o 25,
the mumber of measurements increases. Since we deal with small numbers of meas-
ements in an analysis the precision is necessarily represented by s
OD comic s7
(Calculate the mean and te standard deviation of the following se of analytical r=
sults: 15.67, 15.69, and 1603
Solution
‘The standard deviation may be calculated slso using the following equivalent
‘equation:
ean
@3)
‘This is useful for computations with a calculator. Many calculators, in fact, have
2 standard deviation program that automatically calculates the standard deviation
from entered individual data,
‘See Secon 3.15 and Equation 3.17
oc another way of estimating foe
four ot les umber“The precision improves asthe square
root of the numberof measurement
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
Baines
(Calculate the standard deviation fr the data in Example 3.7 using Equation 33.
Solution
aig
‘The difference of 0.01 g from Example 37 is nt statistically significant since the
vataton is at least 0:2 g. In applying this formula its important to keep an ex-
‘ra digit or even two inf forthe ealelation.
‘The standacd deviation ealcultion considered so far is an estimate of the
probable enor ofa single measurement. The arithmetical mean ofa sces of N
‘eaturerients taken from an infinite population wil show less seater from the
“eve value” than will an individual observation The seater wil decease as Nis
increased; as N gets very large the sample average will approach the population
average ja, and the scatter approaches zero, The arithmetical mean derived from
‘N measurements can be shown to be\\/N times, more relisble than a single. mea
_suement. Hence, the xandom ester in the mean of a series of four observations is
neal at ofa single observation. In other words, te precision of the mean
cof N measurements i inversely proportional to the square root of N of
ation ofthe individual values. Thus, -
Sanda devon of he mean | 04
‘The standard deviation ofthe mean i sometimes refered to. the sand eae,
‘The standard deviation is sometimes expresed as the relative standard des_
‘ition (sd), which is just the standard deviation expressed as «fraction of the
eof the mean (34), which is often.
‘HET Uualy ii given a.
Galled tb coefficient of variation,
@ Example 3.9
‘The following replicate weighings were obtined: 29.8, 30.2, 286, and 29:7 me.
Calculate the standard deviation of the individual values and the standard devia-
tion of the mean, Express these as absolute (units of the measurement) and rela-
tive (6 of the measurement) valves.137. STANDARO DEVIANION—THE MOST IMPORTANT STATISTIC
Solution
« oom
a8 oot
302 036
286 100
227 odo
ples 519 5 tat
ya las
Sg ee
eo ngehaun 2 x 0 = 29% coca
ie oot
sen {2 Song nt, 8 008 = 8 ta
a ae
The precision ofa measurement can be improved by-inereasing the number
js inceased and woald approach
“6 the numberof observations eppteached init. However, as seen above
(Eatin 3.4, the deviation ofthe mean docs not decrease in direst proportion
“the numberof observations, bu insted it desreses asthe aqure root of he num-
tat of observations. A point willbe seached where a slight increase in precision
wil require an unjusiishy lrg inrease inthe number of observations. For ex-
ample, to decrease the standard deviation by afacor of 10 requires 100 times as
many observations
‘The practical limit of wseful replication is reached wien the standard devie
tion of the random erors is comparable tothe magnitude of the determinate or 5ys-
tematic error (unless, of couse, these canbe identified and comected for). This is
because the systematic exots in a determination cannot be remove by replication,
‘The significance of sin elation tothe normal cistibuton curve is shown in
Figure 3.2. The mathematical feast from which the curve wes derived reveals
{hat 68% ofthe individual deviations fll within one standard deviation (for an ine
finite population) from the meen, 95% ae less than twice the standard deviation,
and 99% are less than 25 times the standard deviation. So, «good approximation
{shat 68% ofthe individual vues wil al within the ange 8 ¢, 98 wil fall
ita = 2s, 99% wil fll within ¥* 25s, and so.
‘Actually, these percentage ranges were derived assuming an infinite muber
of measurement. Thre are then two reasons why the analyst cannot be 95% cxr-
tain thatthe tue value falls within ¥ = 25. Fist, one makes limited number of
measurement, and the fever the measurements, the less certain one will be. See-
‘ond, the normal distribution curve assumes no deteminate eros, bat only random
‘rors Determinae enor, in effect, sift the normal erreur fom the te value.
‘An estimate of the tual certainty a mumber falls within scan be obtained from a
calculation ofthe confidence lini (Se below)
tis apparent tht there ae a variety of ways in which the precision ofa nue
ber can be reported. Whenever a number is reported as 7 x, you should always
‘qualify under what conditions this holds, that is, how you arrived at x. Tk may,
{or example, represents, 25, (mean), or the coefiient of variation.
“Randoenes i equ wo make
sual eaeustins come out
rien”—Anonymons|
‘Thee value wil all within 8
26956 of the ime for 2 infinite
‘numberof mescrements, See the
‘confidence limit end Example 315,B
‘The vince uals
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY.
‘A term tats sometimes useful in statistics i the variance. This isthe square
ofthe standard deviation, s?. We shall use this in determining the propagation of
‘error and in the F test below (Section 3.13)
3.4 Use of Spreadsheets in Analytical Chemistry
A. spreadsheet isa power software program that canbe used fra variety of
funeions, sich dst analysis nd pling Spreadihects are seta oe rei
ing dt, ding repetiive celts, and Gipayng the cakslaons graphically
or in char fr, They have builtin fenetons, fo example, standard deviation and
Sher tail Sisto, for carying owt compsions on tat re inp Py
the wae Poplar spreaishst programs ince Miron Excel, Lots 1-23, nd
(utr Pro All eperste basically the sae tif somevhst in specie com
‘mands and sya Because oft widespread aalatity ane pep, we vl
te Excl in our isaton
“You probably have we spesdsoe program before and re fair with
the basi ection. But we wll summarize re the ost rfl spt for analy
ical cheminyappicatons You should eet the peadsest ma for more de
‘le nfrmation. lo, th Exel Hep onthe tol bar prover pect formation
‘You ae refered 1 the excellent trl on wins the Exes pendshee p=
prety fc alfa Sate University at Stasis: gai
‘SdftaraxccVindx im The basic fonctions in th spreadsheet ae desebed,
incadig entering dia and fonnla, formating ces, graphing, and regression
‘alysis Yu wl nd ti ery helpful and soul defintely cea before con
Snug. The website yorwknsi/—comesdCHEMSO at Wester Keach
Universi gives summary iasroctions for graphing wing eter Micro Excel,
or Lots 1-23. Go othe exceendou hal en Joust. inks.
1 spreadsheet consis of ells aanged in columns (abled A, B,C, )
and rows (umber 2 3, A nd cll dented by column
Teter and rw nia, for example, BS. Figure 3.3 has the eer ped nto
some ofthe cls illseate When the movse point (he cms i cicked onan
vidual elt becomes the active eel (Sark ines aon), andthe active cell
isindcated the op eft ofthe formal a, and the contents of toc aes.
{the gt ofthe equ sgn onthe ar
FILLING THE CELL CONTENTS
‘You may enter tet, numbers, or formulas in specific cells. Formulas are the key
to the uly of spreadsheets, allowing the same calculation to be applied to many
numbers. We wil illustrate with ealelations of the weights of water delivered by
two different 20-m pipes, fom the difference in the weights of «flask plus water
and the empty flask. Refer to Figure 34 as you go through the steps.
Fig. 13. spreadsheet ets,{3.9 USE OF SPREADSHEETS IN ANALYTICAL CHEMISTEY.
x Seo
Nar weigie
Weight otis 3
sight
ai Be=Be-as
esl 0820005
%
6
fa
3
‘Open an Excel spreadsheet by clicking on the Excel icon (oc the Microsoft
[Excel program under Sta: Programs). You will enter text, numbers, and formu-
las. Double click on the specific cell activate it. Enter es follows (information
{ype into a cell is entered by depressing the Enter key)
(Cell At: Net weighs
Call A3: Piet
(Cell Ad: Weight of flask + water, g
(Call AS: Weight of ask, ¢
(Call 6: Weight of water,
‘You may make corrections by double clicking ona cell then edt the text (You can
also edit the text inthe formula bar) If you singe click, new txt replaces the old
text. You will have to widen the A cells o accommodate the lengthy text. Do so by
placing the mouse poimer an the line becween A and B onthe row at the top, and
‘ragging itt the right til ll he text shows. This moves the other ells to th Fight
(Call B3: 1
Cen C3: 2
(Cet Ba 47-700
Catt Cl: 49.239
(Cell BS: 27.687
Call C5: 29.199
Cell BG: =B4-BS
You can also enter the formula by typing =, then click on BA, then type —. and
click on BS. You neod to format the eels B4 to C8 to thre decimal places. High-
light tat block of ces by clicking on one comer and dragging to the opposite cor-
ner of the block. Tn the Menu bat, lick on Format:CellsNumber. For Decimal
places: type 3, and clic OK
‘You need to add the formula to cell C6. You can retype it. But there is an
casi way, by copying (fling) the formula in cell B6. Place the movse pointer on
the lower tight comer of cell BG and drag it to cell C6. This fils the formula into
C6 or addtional cells tothe right if there are more pipet columns). You may also
Al formulas ino highlighted cells by clicking on Edit FillDown(or Righ).
Double click on BS. This shows the formula in the cell and outlines the other
‘els contained in the formula, De the same for C6. Note that when you activate
th cell by either single or double clicking on it, the formula is shown in the for
‘mula ba,
Fig. 84, rtiog cet content
nDATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY.
‘SAVING THE SPREADSHEET
Save the spreadsheet you have just ereated by clicking on File:SaveAs, I like to
save documento the desktop first, Then they can be dragged to whatever fle you
‘wish, for example, My Documents. That way they don't get lst, So select Desk-
{op athe top. Give the document a File Name atthe botom, for example, Pipet
Calibration. Then elie Save. Ifyou wish to place the saved document on & disk,
you can drag it from the desktop tothe opened disk.
PRINTING THE SPREADSHEET
(lick File:Page Setup. Normally, a shect is printed inthe Portrait format, that is,
vertically on the 8% % L1-inch paper. If there are many columns, you may wish
‘o print in Landscape, that is, horizontally. Ifyou want gridines to print click on
‘Sheet-Gridines. Now you are ready to print. Click on Prin OK. Just the working
see ofthe spreadsheet wil print, not the column and row identifiers.
RELATIVE VS. ABSOLUTE CELL REFERENCES
Tn the example above, wo used relative cell references in copying the formula, The
formula in cell B6 said subract the cell above from the one above it. The copied
formula in C6 sad the same for the cells above it.
Sometimes we need to include a specific cel in each calculation, containing
sy, 2 constant. To do this, we need to identify it in the formula as an absolueref-
erence. This is accomplished by placing a $ signin front of the column and row
cell identifiers, for example, $B§2. Placing the sign infront of both assures that
whether we move scoss columns or rows, it will remain an absolute reference.
‘We can illuate this by creating a spreadsheet to calculate the means of dif-
ferent seies of numbers. Fill in the spreadsheet as follows (refer to Figure 35)
‘Al: Titration meas
‘AS: itn, No,
BB: Series A, mL.
C3: Series B, mk
Ba: 3927
BS: 39.18
B6: 39.30
BT: 39.20,
445.59
5:45.55
6: 45.65,
cr: 45.66
Ad b
‘We can type in each ofthe tration numbers (1 through 4), but there are automatic
‘ways of incrementing a sting of numbers. Click on EditFillSeris. Check Columns
and Linear, and leave Step Value at 1. For Stop Value, enter 4 and click OK. The
‘numbers 2 through 4 are insered inthe spreadsheet. You could also first highlight
the cells you want filled (beginning with cell Ad. Then you donot have to insert a
‘Stop Value. Another way of incrementing a sris isto doit by formula. In cll AS,
(ype = A4+1. Then you can fill own by highlighting from AS down, and clicking
‘on Ei FillDown (This i relative reference.) Or, you can highlight cell AS, lick{38 USE OF SPREADSHEETS IN ANALYTICAL CHEMISTRY. a
ESS: =
[5 frmie.—| sara | ss
ieee
—
fet $1 — sao ‘aa
3a
ie we| tees
Sex -oesaa comers
|i curso | Soars AST nm ca
en = eat Fit. 35. rela and sso ce
Hel Wane iad Oo it aap aoe
on its lower right comer, and drag it to cell A7. This automaticaly copes the for-
‘ula ia the ther cells.
[Now we wish to inser formula in cell B8 to calculate the mean, This wil be
the sum divided by the numberof tations (ell AT).
BS; =sum(B4BTVSAST
‘We place the S signs in the devisor because it wil be an absolute reference that we
‘wish to copy to the right in cell C8, Placing a$ before both the column and row
txddcesses assures that the cel wll be weated a8 absolute whether i is copied ori-
~zonully oc vercally. The sum(B4:B7) isa syntax in the program for summing &
series of numbers, fom cell B4 through cell BT Instead of typing in the cell ad-
dresses, you ean also type *=(", then click on cll BS and drag cell BT, and type
")", We ave now calalated dhe mean for series A. We wish todo the sane for se-
ries B. Highlight cell BB, click on its lower right come, and drag itt cell C8. Voila,
the next mean is calelated! Doubleclick on cell C8, and you will ee tha he for.
smu bas te same divisor (absolute reference), but the sum isa relative reference.
TE we had noc type in the $ signs to make the divisor absolut, the formule would
have assumed it was relative, and the divisor in cell C8 would be call B.
USE OF EXCEL STATISTICAL FUNCTIONS
[Excel bas a large number of mathematical and statistical functions that can be used
{or calculations in ie of writing your ov formula. Let's uy the statistical func-
tions to automatically calculate the mean, Highlight an empty cell and click fon
‘ie tool bat. The Paste Function window appears. Select Statistical inthe Function
category. The following window appears:DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
Select AVERAGE forthe Function mime. Click OK, and for Number, type BAB 7,
and click OK. The same average is calculated as you obtained with your own fr
‘mula. You can also type inthe activated cel the syntax =average(B4:B7). Try it
[Let's calculate the standard deviation ofthe resus. Highlight cell B9. Un-
der the Statistical function, select STDEV forthe Function name. Alternatively,
you can type the syntax into cell BO, =sulev(B4:B7). Now copy the formula to
‘ell C9, Perform the standard deviation calculation using Equation 3.2 and com-
pre with the Exel values, The ealeulaton fe series is 0.05 mL. The value in
the spreadsheet, of course, should be rounded to 005 ml.
USEFUL SYNTAXES.
Excel has mumerous mathematical and statistical functions or syataxes that ean be
used to simplify setting up calculations. Peruse the Function names for the Math
{& Trig andthe Statistical function categories under fin the toolbar. Some you will
find useful for this text are:
Math and wig functions
OGIO Caleulates the ese-10 logarithm of a number
PRODUCT Calculates the products ofa series of numbers
POWER Calculates the result of a mumber raised to e power
SQRT Calculates the square rot of umber
Statistical functions
AVERAGE Calculates the mean ofa series of numbers
MEDIAN Calculates the modian of a series of numbers
STDEV Calculates the standard deviation ofa series of numbers
TIEST Calculates the probability associated with Students # test
VAR CCalculas the variance ofa series of numbers
‘The syntaxes may be typed, followed by the range of cells in parentheses, as we
Aid above.
‘This tutorial should provide you the basis for other spreadsheet applications
‘You can write any formula that i inthis book into am active eel, and insert ap-
propriate date for calculations. And obviously, we ean perform a variety of data
analyses. We can prepare plots and charts of the data, for example, a calibration
curve of instrument response versus concentratioa, along with statistical informa-
sion. We will ustrate this later inthe chapter.
44 Propagation of Errors—ct Just Additive
‘When discussing significant figures eales, we sated that the relative uncertainty
in the answer toa muliplication or division operation could be no beter than the
relative uncertainty in the operator that had the poorest elatve uncertainty. Also,
the absolute uncertainty inthe answer of an addition or subtraction could be no
better than the absolute uncertainty in the number with te largest absolute uneer-
‘ainy. Without specific knowledge of the uncercinies, we assumed an uncertainty
of at leat 1 in the last digit of each number39 PROPAGATION OF ERRORS—-NOT JUST ADDITIVE
From a knowledge ofthe uncertainties in each number, its possible 1o est-
rat the actual uncertainty inthe answer. The error in the individual numbers will
‘propagate throughout a series of calculations, in either a relative or an absolute
fashion, depending on whether the operation is « multiplication or division or
whether itis an addition o a subtraction.
ADDITION AND SUBTRACTION—THINK ABSOLUTE VARIANCES
(Consider the addition and subtraction of the following number:
(6506 = 007) + (16:13 = 001) ~ (22.68 + 0.02) = 58.51 (+7)
‘The unceruines listed represent the random or indeterminate errors asociated
‘with each number, expressed as standard deviations ofthe numbers. The maximum
ror of the summation, expressed 88 standard devistion, would be 0.10; that
{sit coud be ether +0.10 or ~0.10 if all uncertainties happened to have the same
sign. The minimum uncertainty would be 0.00 if all combined by chance to can-
cel. Both of these extremes are not highly likely, and statistically the uncertainty
wil fall somewhere in between. For addition and subtraction, absolute uncertain
ties are additive. The most probable ero i vepresented by the square rot of the
sum of the absoluze variances, That is, the absolute variance of the answer isthe
‘sum of the indiveal variances. For a= +e ~ d,
aegtatd 6s)
=Vardra oo
In the above example,
VEO OOH EOE
VEBRITT + EIN IO + ARID
= VESERIO7 = #73 X 107
So the answer is $8.51 + 0.07. The number +0.07 represen the absolute uncer-
taimy. If we wish to express i as relative uncertiny, this would be
+007
Fe x 100%
19%
a Example 310
‘You have received three shipments of uranium ore of equal weight Analysis ofthe
three ores indicated contents of 3.978 = 0.004%, 2.536 * 0.003%, and 3.680 =
0.003% respectively. What is the average urenium content of the ores and what
are the absolute and relative uncertainties?
Solution
0.00858) + (2.536 = 0.003%) + (2.680
3
‘The sbolte varices of ations
ad subrction are adie.4 DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
‘The uncertainty inthe summation is
Vie ODay + EUMF = OOF
VEIOR IF + EIKO + EXIT
= VEX IO = 58x 107% U
ence, the absolute uncertainty is
aos
A = 0006% = 3.398 + 0.006% U
[Note that since there is no uncertainty inthe divisor 3, the relative uncertainty in
‘the uranium content is
58x 1078 U
3.298% U
2X10 or 02%
(MULTIPLICATION AND DIVISION—THINK RELATIVE VARIANCES
Consider the following operation:
“Tae relive variances of matin (13.67 + 0.020204 = 02) oo.
‘ion and vision ae adv. Fes ong 35602)
Hee, the relative uncertainties ee additive, and the most probable ero is repre-
sented by the square r00t ofthe sum ofthe relative variances. That is, the relative
‘ariance ofthe answer isthe sum ofthe individal relative variances
Fora = bold,
(n= CB B+ Da en
(ou= Vedat Dat Da ex
Inthe above example,
0.0015
= 4000]7
20008
468
Gua = VEOO0ISF + (00017 + ZOD
= VETIX 109 + EDI K 10-9 + (ELT X10,
= V@EEX 105 = +26 x 107
(de =
00013
“The absolve uncertainty is given by
aXe
= 3560 x (42.6 x 10) = +093
‘So the answer is 356.0 = 0.9.139 PROPAGATION OF ERRORS—NOT JUST ADDITIVE
éxample 31
CCalulate the uncertainty in the number of millimoes of chloride contained in
2500 mi of a sample when three equal aliquots of 25.00 mL ae tivated with sil-
ver nitrate withthe following results: 36.78, 36.82, nd 36.75 mL. The molarity of
the AgNO, solution is 0.1167 + 0.0002
Solution
“The mean volume is
36.78 + 3682 + 36.75
: ! = 36.78 mL
‘The standard deviation is
a win?
3678000 0.0000
: 3682 004 0016
3675003 0.0009
= 0.005
= 0035 Mean volume = 36.78 = 0.04 mL.
smumol CP trated = (0.1167 + 0.0002 mmol }36.78 * 0.04 mL) = 4.292 (7)
Gig
(da = VEDOOTTE ¥ EO.00TTE
= VETER ITF S090 K 105
= VEER IO*
19x10
The absolute uncertainty inthe milimoles of CI is
4.292 X (:£0.0019) = +0.0082 mmol
rmavol CI in 25 mil = 4.292 + 0.0082 mmol
‘mmol CI in 250 mL. = 10(4.292 + 0.0082) = 42.92 = 0.08 mmol
[Note that we retained one extra figure in computations until the final answer. Here,
the absolute uncertainty determined is proportional wo the size of the sample; it
‘would not remain constant fr twice the sample size, for example
I there isa combinacom of muplication/tivision and addtion/subraction
ina calculation, the uncertainties ofthese most he combined.DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
imei
‘You have eceved tree shipments fin ore of the following weights: 2852, 1578,
and 1877 Ib. There isan uncertainty inthe weights of =5 Ib. Analysis ofthe ores
gives 36.28 * 004%, 22.68 = 0.03%, and 49.23 + 0.06%, respectively. You ae
to pay $300 per ton of iron, What should you pay for these thee shipments and
‘hut isthe uncertainty in the payment?
Solution
‘We need to calculate the weight of iron in each shipment, with the uncertainties,
and then add these together to obtain the total weight of iron and the uncerainty
in this. The relative uncenintis in the weigh ae
+5 +5 +5
Japp 20017 apy 000 EF 00077
“The relative uncerinties inthe analyses are
104 03 06
Fem 200 ee toms = e002
“The weighs of iron in the shipments are
a soc 00
GB SHOES OO aren ate
Gola = VEDOOIT + CEOOOITF = 20,0020
5. 10347 X (0.0020) = =2.1 Ib
Ib Fe = 10347 = 2.1
(We will carry an additional figure throughout)
sts
(22.68 = 0.03%)
= = 357.89 (2) D Fe
Gola = VED.OOBIE + EOOOIF = 0.0034
57.89 X (0.0084) = 1.2 Th
Ib Fe= 3579121
(1877 * 5 1644923 + 0.06%)
= 924.05 (=2) Ib Fe
100,
VEDURTF + OTF = 0.0030
4.05 x (0.0030) = +28 ib
9240228 tb
‘Toul Fe = (1034.7 2 2.1 Ib) + (357.9 = 12) + (924028 B)
2316.6 (27) Ib
a= VEDI LaF DBF = 23.7
‘Toa Fe = 2317 + 41b
Price = (2316.6 « 3.7 by$0.157b) = $347.49 + 0.56
Hence, you should pay $347.50 + 0.6039 PROPAGATION OF ERRORS—NOT JUST ADDITIVE
Be banpe 3s
‘You determine the acetic seid content of vinegar by titrating wih a standard (known
concentration) solution of sodium hydtoxide to a phenolphihaein end point. An
approximately 5-mL sample of vinegar is weighed on an analytical balance in 3
weighing bore (the inerease in weight represents the weight of the sample) and
is found to be 5.0268 g. The uncertainty in making a single weighing is =0.2 mg.
‘The sodium hydroxide must be accuraisly standardized (its concentration dete
mined) by tating known weights of high-purity potssium acid phihelate, and
three such tations give molarites of 0.1167, 0.1163, and 0.1164.M. A volume of,
136.78 mi. of sodium hydroxide is used to Gate the sample. The uncertainty in
reading the buret i 0.02 mi. What isthe percent sceic acid inthe vinegar, and
‘what ists uncertainty”?
Solution
‘Two weighings are required wo obtain the weight of te sample: that ofthe empty
‘weighing bottle and that of the bore plus sample. Each has an uncertainty of
120.2 mg, and so the unceraimy of the net sample weigh (he difference ofthe
«wo weights
= VET COG =
03 me
‘The mean of the molarity ofthe sodium hydroxide is 0.1165 Bf, and its standard
deviation is +£0.0002 Bf. Similarly, two buret readings (Intal and final) are re-
quired to obtain the volume of base delivered, and the total uncertsigy is
sa = VOUS = (E00F =
0.03 mL.
‘The moles of acetic acid are equal tothe moles of sodium hydroxide used to irate
it so the percent of acetic acid is
(50268 = 03) mg
x 100%
= 5119 7%
‘The uncertainty inthe formula weight of acetic acid is assumed to be negligible
(could actually ealelate ito six figures to be exact).
0.000
Faas = 200017
(id‘The numberof significant gues in
sn mer is determine by the une
certainty due to propagation of ence,
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY.
The uncertainty inthe analysis is
ahs = VEDOOTTF + OOO + ZO. OUGIF = = 0.0020
Seat = 5.119 % 010020 = 0.010% acetic acid
Hence, the acetic uc content i 5.119 0.0108. The relative uncertain is 04 po.
‘The factor that limited the uncertainty the most was the variance in the mo-
larity of te sodium hydroxide solution, This lustrates the importance of eareful
calibration, which i discussed in Chapter 2.
3.10. Significant Figures and Propagation of Error
‘We noted earlier that the total uncertainty in computation determines how accu-
rately we ean know the answer. In other word, the uncertainty ses the number of
Significant gues. Take the following example:
(731+ 0.290.245 + 0.008) = 164.1 + 0.7
We are justified in keeping four figures, eventhough the key number has thee.
“ere, we don't have to carry the additonal figure asa subscript since we have i=
dicated the actual uncertainty in it. Noe that the greatest relative uncertainty in the
smulipliers is 0.0036, while that inthe answer is 0.0043; so, due to the propaga-
tion of eror, we know the answer somewhat less accurately than the Key umber.
‘The key number (ihe one with he greatest uncertainty), when actual uncertainties
ae Known, may not necessarily be the one with the smallest numberof digits. For
‘example, the relative uncertainty in 78.1 = 0.2 0.003, while that in 11.21 = 0.08
is 0.007,
‘Suppose we have the following calculation:
(73.1 = 0.992.245 = 0.008) = 164.1 = 2,
42
Now the uncersiny in the answer isthe units place, and so figures beyond that
‘are meaningless. In this instance, tbe uncertainty ia the key number and the an-
‘swer ae similar (0.012) since the uncertainty in the other multiplier is signifi-
cantly smaller.
OB scan 3
Provide the answers tothe following calculations tothe proper numberof signif
jean figures
@ (8.68 = 0.07) ~ (6.16 + 0.09) = 32.52
(12.18 = 0.0823.04 + 0.07)
e 3247 = 0006 S0882.11 CONTROL CHARTS
Solution
(@) The calculated absolute uncetainty inthe answer is =0.11. Therefore, the an-
swer is 325 2 0.
(8) Te calculated relative uncertainty inthe answer is 0.0075, so the absolute un-
certainty is 0.0075 X 86.43 = 0.65. Therefore, the answer is 86:4 * 0.6, even
though we know all he other numbers to four figures; there is substantial un-
‘erent in the fourth digit, which leads to the uncertain in the answer. The
relative uncertainty in that answer is 0.0075, andthe largest relative uncer-
tainty inthe other numbers i 0.0066, very similar
SIL Control Charts
‘A quality control chart is a time plot of « measured quantity thet is assumed to
'be constant (vith a Gaussian distribution) for the purpose of ascertaining thatthe
measurement remains within a statistically acceptable range. It may be a day-to-
day plot of the measured valve of a standard that i run intermiendly with saa-
ples. The contol chart consists of eentral ine representing the known or assumed
value of the contol and either one or wo pairs of limit lines, the inner and outer
control limits. Usually the standard deviation of the procedute is known (a go0d
cstimate of o), and this is used to establish the contol limits.
‘An example ofa control char is illustrated in Figure 3.6, representing a plot
of day-to-day results of the analysis ofa pooled serum caleium or a coatrl sam-
ple that is ran randognly and blindly with samples each day. A useful ine eoatol
Timitis two standard deviations since there is only 1 chance in 20 dat an individ-
tual messurement will exceed this purely by chanee, This might represent warn-
ing limit. The outer limit might be 2.5 oc 3, in which case there is ony I chance
in 100 oF 1 chance in $00 a measurement wil fll ouside this range inthe absence
of systematic err. Usually, one contol is run with each batch of samples (¢-.,
20 samples), 0 several contro points may be obtzined each day. The mean of these
‘may be ploted each day. The random scatter of his would be expected to be smaller
by VN, compared to individual poins.
Prticlar attention shouldbe paid to trends in one aection; thats, the points
lie largely on one side of the central line. This would suggest that ether the com-
tool is in eror o there is a systematic error in the measurement A tendency for
points to lie ouside the control limits would indicate the presence of one or mare
Aeterminate errors inthe determination, andthe analyst should check for detrio-
ration of reagents, instrument malfunction, or environmental and other effects.
Calum quality contol chart for October 2002
A contrl chats contact by oe
odealy runing & own" conta
sample
Fig. 38.
“Typical guity control char.ed DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY,
‘Trends should signal contamination of reagents, improper calibration or exoneous
standards, or change in the contol lot
S12 The Confidence Limit—ow Sure fre You?
‘The eval il within he cont- Caleulation of he frase of data provides an indication of the
tence nonin ings ute. _ Scion eres pricier poate anal et a
‘esd confidence level umber of data it
See caer ae seo nero ema ae
as ects nn a
iced ate ue ae ae within
Comes olloe or cohen el sally epee 3 —
‘zat The confidence iii given by
reat ony
eas oe ie
confidence evel aid —
depen a eat ae Stn mDUSS 1 Nov Mate oes si —
Bl th prt of an he standard evition of he mea (VT son
xevhen Nis Tis given by x 215, being
“larger than that of the mean by-a factor \/A-1is far he number of meas
“wrements used to determines) —
Table 31
Values of for » Degrees of Freedom for Various Confidence Levels?
‘Conience Level
* 0% 98 25%
1 oa 12706 easr im
2 2500 4305 9925 14089
3 2333 382 Sasi 7453,
4 aise 278 4908 5338,
5 201s asm as ans
‘ 1963 ast 307 “a
7 1395, 2365 3300 sa
4 1360 2306 3.385 3832
9 1333 20 3250 360
0 1812 28 316 3581
5 1953 2131 2947 3252
» ins 2085, 2345 3183
2s 1708 2050 281 3078
* 164s 1960 2576 2807
TENA I= geet feos.2412. THE CONFIDENCE LIMIT—HOW SURE ARE VOU?
BP sanyiets
‘A soda ash sample is analyzed in the analytical chemistry laboratory by titration
‘wih standard hydrochloric acid. The analysis is performed in triplicate with the
following resulis: 93.50, 93.58, and 93.43% NaCO,. Within what range are you
95% confdent thatthe true value Ties?
Solution
‘The mean is 93.50%, The standard deviation sis calulaed to be 0.075% Na,CO,
(ebsolute—celeuste it with spreadsheet). At the 95% confidence level and two
degrees of freedom, += 4.303 and
4303 % 0.075
2502
‘0 you ate 95% confident that, inthe absence of a determinate ecror, the ue value
{alls within 93.31 to 93.69%. Note thet for an infinite mumber of measurements,
‘we would have predicted with 95% confidence tha the trae valu flls within
sandard deviations (Figure 3:2); we see that for v =, ¢ is actually 1.96 (Table
3.1), and So the confidence limit would indeed be about twice the standard devia-
tion of the mean (which approsches ¢ fr large N).
[Remember from Section 3:7 and Figure 32 that we are 689% confident that
the true value falls within 1, 95% confident it wil fll within +20, and 99%
confident it will fall within =2-Sc. Nove that itis possible to estimate standard
evition from a stated confidence interval, and vice versa a confidence inteval
from a standard deviation, Ifa mean value is 7.37 © 0.06 atthe 95% confidence
interval, then sine this is two standard deviations for a suitably large number of|
‘measurements, the standard deviation is 0.03 g, If we know the standard deviation
is 003 g, thea ths isthe confidence interval atthe 68% confidence level, ori is
(006 g at the 95% confidence level. For small numbers of mieasurements, willbe
larger, which proportionately changes these numbers.
‘As the number of measurements increases, both # and s/V/V decease, with
‘he rest thatthe confidence interval is marowed, So the more measurements yo
make, the more confident you will be thatthe wue value lies within a given range
of, conversely, thatthe range will be narowed ata given confidence level. How
ever, decreases exponentially with sn increase in NV jst a8 the standard devia-
tion of the mean does (see Table 3.1), 80 point of diminishing retuns is eventually
‘ached in which the increase in confidence is not justified by the increase in the
‘multiple of stmples analyses required
A
‘To high 3 condence level wil pve
«ide ange that may encompass.
‘oarandom numbers, Too lou 2
‘confidence level wl gve 2 narow
range and excage vl random
umber. Confidence level of 90
{0.95% are generally accepted a
reasonable,
Compare with Figure 3.2 where 955
ofthe ales al within.92 DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY.
313 Tests of Signiicance—1s There a Difference?
ical method, itis often desirable w compare the rsulls
tied.
eee ee ee eee a
~ sgl Aa sm sae ma
‘The Fens wed decrmise Thin. tat. deine to indicate whether thee isa significant ference be-
ittwo incest saally ca fa mtd, andar devians. Fis defined in tenn of
siren “ie variances ofthe (wo methods, where tbe varlames is Le squve of the sa
end devon’
@.10)
‘vnre 53 > sf There ae wo diferent depress of freadom, and v». where de
res of freedom is defined as N — for each
~Ffthe calculated F value from Equation 3.10-exceeds a tabulated value at_
‘the selected confidence level, then there isa significant difference between the vari
ances of the two meds. Ais of F values atthe 95% confidence level is given
Sn Table 3.2,
Table 32
Values of Fat the 95% Confidence Level
3 4 5 6 7 8 9 © § wm
v=? 190 192 192 93 13 WA 194 194 194 198 194 195
935 928 942901 8D «BAD BES BBL «RTD 870 BSS
4 6b 65063925 GIG] SOHO) 56 «S85 SRO 55
5 51 S41 SUD 505 495488482477 AKAD AS6 430
6 54 476453 99-428 2141S 410406343873
7
8
°
4 433 4i2 397 387379373368 36h SSL aM 3
44 0 407 AB 3358350 3a 33935-32308
426 3860463348337 329323313301 28h 286
wo 410 «371348 3333223307 3m gk 285277270
15 368 329 305-250 279-271 6k 259 ast 240238225
2% «34 310287 27 2D -251 2S 232235 9222.04
30332292 269253 2422332272226 2019© 3 TESTS OF SIGNIFICANCE 1S THERE A DIFFERENCE? 93
a Example 8.16
‘You re developing new colrimetic procedure for determining the hicosecon-
tenta Blood serum, You have chosen the standard Flin- Wu procedure ith which
to compe your ress. From fhe fling twee plete anslees. on. e
‘sme sap, determine wheter he varias f your metho ders signify
from da oF ie standard method,
Your Method (mg/dl) Folin-Wu Method (mg/dl)
ies / 3
/ as
13 {on
10 9
1 17
16 125
129 \
mea) 127 sean) i Mw
He
‘The variances are aranged so thatthe F value is (1)'The
yy 6 and m= $15 495. Since the CCNA vals es han this, we con
“lade that tere is no significant difference in the precision of the two methods,
that i, the’ standard deviations
the sample.
{rom random err alone and don't depend on
‘THE STUDENT T TEST—ARE THERE
DIFFERENCES IN THE METHODS?
“Tae test is wed to determin f 590
sets of messureents ae sttstialy
Sone
ments made by to iflent methods eof them wil beth est method. an
“he ofice willbe an accepted method statistical wah salelated 25d comm
ued with tabulated vale for the ven aumber of ent at ibe desi cofdence
vel (Table 3.1). Ifthe calculated ¢ value exceeds the tabulated s-value-then-here.
‘sa infant erence Bewosn the rely the two method. hat snd
evel it doesnot exceed the tabulated value, then we ean predict that theres mo
L[DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHENISTRY.
-sionifcan ference been the methods, This in.no-way impli tbat the 40.
“sults ae identical,
ew ih scm ed ln ie es ote
“cate analyses on single sample may be performed using two methods, o° a Se-—
‘es of analyses tay be performed on a et of diferent samples by the «wo met
1. £ Test When an Accepted Value Is Known. Note that Equation 3.9 is a
representation of the tue value jt. We can waite it
ey
Ie follows that
= o- p>] G12
1g, fom National Ts of Standards and Tecnology (NIST) standard ee
“ence mateval oc the ntimaie_in.chemical analysis, an afomie_ weight) then
Eavaton 3.12 can be-used.to determine wheter the vale obtained from a tet
“eos is saitelly ea
Bane
‘You are developing a procedure for determining traces of coppet in biological ma-
terials using a Wet digestion followed by measurement by atomic absorption spec
‘rophotomety. Tn onder (0 test the validity of the method, you obtain an NIST
‘orchard leaves standard reference material and analyze this material. Five replicas
‘ae sampled and analyzed, and the mean of the results ig found to be 10.8 ppm.
‘witha standard deviation of +0.7 ppm. The listed valu is 11.7 ppm. Does your
‘method give a statistically correct value atthe 95% confidence level?
Solution
en
v5
= aos - ny
‘There are ive measurements, so there are four degrees of freedom (N ~ 1). From
“able 3.1, we see that the tabulated value ofr atthe 95% confidence level is 2.776.4:13 TESTS OF SIGNIFICANCE IS THERE A DIFFERENCE?
‘This s less than the ealeulated value, so here is a determinate erro inthe new
procedare. That is, there is a 95% probability thatthe difference between the ref-
ference value and the measured valu is not due to chance.
[Note fom Equation 3.12 that as the precision is improved, tha is, ass be=
‘coms smaller, the calculated 1 becomes lager. Thus, there is a greater chance that
the tabulated ¢ value willbe les than this, Tha is, a the precision improves, it is
easier to distinguish nonrandom diferences. Looking again at Equation 3.12, this
‘means as s decreases, so must the difference betwoen the two methods (& ~ 1) in
‘order forthe difference to be ascribed only to random error. What this means is
that comparing very lage sets of samples, witha smaller, will nearly always lead
‘o 8 statistically significant diference, but a statistically significant result is not
‘necessarily important beeause of the large numberof semples that better describe
the population.
2. Comparison of the Means of Two Samples. When the wt is applied
to tvo ses of data, yin Equation 3.12 is replaced by the mean of the second se.
‘The reciprocal of the standard deviation ofthe mean (V/N/) is replaced by that of
the diflerences berween the two, which is readily shown tobe
Ne
MM,
where 5, is the pooled standard deviation of the individual measurements of (wo
‘The pooled standart devit <
proved estimate of the precision ofa method, and tis use for calculating the pre
(Spon oT He Wo se of data i ape te, Thal rater tan sling on 8
‘fession ofa method its sometimes preferable
to perform several set of analyse, for example, on diferent days, ot on iferent
samples with slightly diffrent composions. IF the indeterminate (fandom) error
“ipapapeT We ee free os e hdnn fe iferetoa sno
pooled. This provides a more reliable estimate of the precision of a method than
Is obtained fom single set The pooled standard deviation sis given by
ena)
Where, «Ry are the means of each of sets of analyses, anda) «=»
‘xq ave the individual values in each se. Nis the total number of measurements
sand is equal to (N) + Na-+ 2+ + Mp. If five sets of 20 analyses each are per-
formed, k = $ and N= 100, (The numberof samples in each set need not be equal.)
N= kis the degrees of freedom obtained from (N,~ 1) + (Ns — 1) #-0- +
¥, ~ 1); one degree of freedom is lost for each subset. This equation represents
8 ombinaton ofthe equations forthe stands deviations of each set of daa.
‘The Fest cam be applied wo the
sariances of he ewo methods rather
‘han assuming they ae satsically
equal before sppving the 2st.[DATA HANOLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
In applying the est between two methods, it is assumed that both methods
have essentially te same standard evstion, thas, each represents the precision
ofthe population (the same a). This canbe verified using the F test above.
Bisnis
‘A new gravimetric method is developed for iron(i) in which the ion is precip-
tated in crystalline form with an organoboron “cage” compound. The accuracy of
the method is checked by analyzing the iron in an ore sample and comparing with
the results using the standard precipitation with ammonia and weighing of Fe0,.
“The results, reported as % Fe for each analysis, were as follows:
Test Method Reference Method
20.10% 18.89%
2050 1920
18.65 19.00
1925 19.70
1940 1940
19.99 y= 19.208
5 = 19.65%
1s there a signitican difference between the two methods?
Solution
th mah Ga BF fa- kh
aio 04s am akg9 035 ama
nso 08s «07mm 00cm
1865 1001.00 »«1900 om (st
1925 040 ©0160 1970-082
W940 025 sets ass
99 0 outs Za ~ 59" = 040
‘SGn — HF = 2.262
i _ 026s
| Oana
“This is less than the tabulated valve (6:26), so the two methods have comparable
standard deviations andthe ¢ test can be epplied
gx Rea Bia =F
YN +N?
= [P22 £080 sug
645-2
gpa 6S=194 [OS
eee 056 | V6+s
“The tabulated # for nine degrees of freedom (WN; +N; ~ 2) atthe 95% confidence
levels 2.262, so there is no statistical difference inthe results by the (wo methods.{342 TESTS OF SIGNIFICANCE—1S THERE A DIFFERENCE?
Rather than comparing (vo metods using ome sample, two samples could
‘be compared for comparability using a single analysis method in manner ident-
cal to the above examples.
3. Paired t Test_Inse clinical chemist oratory, «new method is frequently
sec analyzing several different samples of gy
“varying composition (vithn physiological range. Tn is case, the £ value is ea
eulatedin ferent form. The difference between each o th a
a5)
G16)
‘where D, is the individual diference between the two methods for each sample,
‘with rogard to sig; and D is the mean ofall the individsl differences.
a Example 3.19
‘You ae developing a new analytical method forthe determination of blood urea ni-
‘wogen (BUN). You want to determine wheter your method differs sigifcanl from
standard one for analyzing a range of sample concentrations expected to be found
inthe routine laborstry. I has been ascertained that the two methods have eomps-
‘able precisions. Following ae two sets of sls fora nomber of individual samples.
Your — Standard
Method “Method _ _
Semple (mg/l) (mg/d) Do D-B @-DY
A 102 105 “03-06 036
B 27 19 08 os 02s
© 86 ar -o1 -04 016
D "ms 169 06 03. 0.09
E 12 109 03 00 0.00
F us aa 04 OL oo1
nu = 0s7
D028
Solution
‘The tabulated ¢ value at the 95% confidence level for five degrees of freedom is
2.571. Therefore, ta < fe and tere is no significant difference between the two
‘methods at this coafidence level.98.
Fring’ thie aw: In any collection
of dt, the gue most ebviously
‘comet, beyond all checking isthe
mistake,
“The 0 ws is we to dari ian
‘uteri de oa determinate et
sor I mo, thon fale within
(he expected random ear and
should be reine.
"And now the sequence of evens in
no pacar orer*—Dan Rather,
(cevsion news anchor.
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
Usually, a testa the 95% confidence level is considered significant, while
‘ne a the 99% level is highly significant, Thats, the smaller the calculate value,
the more confident you sre that there is no significant difference between the two
‘methods If you employ too low a confidence level (¢., 80%), you ae likely to
cconclude eroneously that tere is significant difference berween two methods
(ype I ere. On the other hand, oo high a confidence level wil require too large
a difference to detect (type Teron). [Fa calculated value is near the tabula value
atthe 95% confidence evel, more ests shouldbe run to ascertain definitely whether
the two methods are significantly different
414 Rejection of a Result: The @ Test
Frequently, when a series of replicate analyses is performed, one of the results will
appear to difer markedly from te others. A decison wil have to be made whether
to reject the resolt orto retain it Unfortunately, thre are no uniform eiteria that
can be used to decide if suspect result can be ascribed to accidental error rather
than chance variation, I is tempting to delete extreme values from a data set be
‘cause they will alter the calculated statistics in an unfavorable way, thats, inrease
the standard deviation and variance (measures of spread), and they may substan-
tilly aller te reported mean. The only reliable basis fr rejection occurs when it
‘can be decided that some speificeror may have been made in obtaining the daubt-
ful result. No result shouldbe rettined in cases where a known error has occurred
nits colection,
Experience and common sense may serve as just as practical a basis for
judging the validity ofa particular observation as a statistical test would be. Fre-
‘quently, the experienced analyst will gan a good idea ofthe precision to be ex
pected in a particular method and will recognize when a particular result is
suspect.
‘Additionally, an analyst who knows the standard deviation expected of a
rmethod may reject a data pont that falls outside 2s of 2.5 of the mean because
there is about I chanee in 20 or I chance in 100 this will occur
‘A wide vatety of statistical tests have been suggested and used to detenmine
whether an obseration should be rejected. In all ofthese, a range is established
within which statically significant observations should fall. The difficulty with
all of them is determining what the range shouldbe. If tis too small, then per-
fectly good data will be rejected; and if i ie too large, then erroneous measure
‘ments willbe retained too high s proportion ofthe time. The Q testis, among the
several suggested tet, on ofthe most statistically comrct for a fily small m=
‘er of observations and is recommended when atest is necessary, The ratio Q is
calculated by errnging the dia in decreasing order of numbers. The difference
between the suspect number and its nearest neighbor (a) is divided by the range
() that is, the difference between the highest mbes and the lowest number. Re-
fering to the figure in the margin, Q = aly. This ratio is compared with tabulated
values of Q. If tis equal to or greater than the tblate valve, the suspected ob-
secvation can be rejected. The tabulated valves of @ atthe 90, 95, and 99% confi-
‘dence levels are given in Table 33. If Q exceeds the tabulated value for a given
number of observations and a given confidence level, the questionable measure
ment may be rejected with, for example, 95% contidence that some definite exor
isin this measurement.{2M REJECTION OF A RESULT: THE OTEST
Table 3.3
Rejection Quotiont, Q, at Different Confidence Limits"
Observations On Om
3 sat 097 0994
4 0765 039 os
5 0682 ono om
6 0560 0.825 om
7 0307 0368 0680
5 0468 0326 0634
9 0437 0393 0598
0 0412 0466 0568
5 0338 0386 047s
20 0300 030 0425
8 om 0317 0393
2» 0260 0298 0372
‘apd rom DB, Race An Che 6 (191) 132,
Diane sa
“The following so of chloride analyses on separate aliquots of & pooled serum were
reported: 103, 106, 107, and Il4 meq. One value appears suspect. Determine if
it can be ascribed to accidental error, a the 95% confidence level
Solution
“The suspect result is 114 meg/L. I differs from its nearest neighbor, 107 meq/L,
by 7 meglL. The range is 114 t0 103, or 1 mea/L. Q is therefore 7/L1 = 0.68
‘The tabulated value for four observations is 0.829, Since the ealeulated is less
‘an the tabulated Q, the suspected number may be ascribed to random etror and
should not be rejected.
Fora small number of measorements (eg, thre to five), the discrepancy of
‘he measurement must be quit larg before it ean be rejected by this criterion. and
itis likely that ecroneous results may be retained. This would cause a significant,
change in the arithmetic mean because the mean is greatly influenced by a dis
‘ordant value, For this reason it has been suggested tbat the median rather than
the mean be reported when a discordant umber cannot be rejected from a stall
number of measurements, Tee median is the middle result of an odd number of|
results, or the average of the central pair for an even number, when they ae arranged
in order of magnicide, The median has the advantage of not being unduly influ-
enced by an outlying value. Inthe above example, the median could be taken as
the average of the two middle values [= (106 + 107)2 = 106). This compares
‘with a mean of 108, which is influenced more by th suspected number.
“The following procedure is suggested for intepretation of the data of thee
to five measurements if the precision is considerably poorer than expected and if
‘one ofthe observations is considerably diferent from the others of the set.
‘Conse epoting the median when
sn outer canot quite be reece,100
Large populition statistics do not
"sic apply for small polation.
“The median may be & beter repre
sentative of thet vale han
the mean, for small numbers of
essere
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
___Wikely to fail, (See. the paragraph. belaw.)
2. Chek edna lead tothe suspected umber ose ita definite ane
an be ientied
ing the medion soe than theme forall. ta.
Ag ans reso, un ante ana. Agreement ofthe neu with
‘he spay vai ata prvi let wi end.
ine te he sapected rout shouldbe rece, You sald wid how
ex cote oning epee al the “gh” are i obo.
“The Q test should not be applied to thee datapoints if two are identical. In
that case, the test always indicates rejection ofthe third value, regardless of the
magnitude ofthe deviation, because a is equal tw and Ox is always equal to 1.
‘The same obviously aplies for thre identical datapoints in four measurements,
snd 80 forth
315. Statistics for Small Data Sets
‘We have discussed, in previous sections, ways of estimating, for a normally dis-
tsbuted population, the central value (mean, ¥), the spread of result (standard de-
ation, 3), andthe confidence limits (tes. These statistical values hold strictly
for a large population. in analytical chemistry, we typically deal with fewer than
10 results, and fora given analysis, perhaps 2 or 3. For such small sts of data,
oer estimates may be more appropri
‘The Q test in the previous section is designed for small dats sets, and we
‘mentioned there some rules for dealing with suspect results
‘THE MEDIAN MAY BE BETTER THAN THE MEAN
‘The median M may be used as an estimate of the contra vale. Tt has the advan:
tage that its not markedly influenced by extraneous (tir) values, ass the mean,
5 The efficiency of M, defined as thé ratio of the variances of sampling distibue
tions of these tw estimates ofthe “tite” mean value and denoted by Ey. is given
in Table 3.4 t varies from 1 for only two observations (where the median is noe-
essary identical with the mean) to 0.64 for large numbers of observations. The
numerical value ofthe efficiency implies that the median from, for example, 100,
‘observation where the eciency i essentially 0.64, conveys as much information
bout te central value ofthe population as does the mean calculated from 64 ob-
servation. The median of 10 observations is as efficent conveying the informa-
tion as is the mean from 10 X 0:71 = 7 observations. It may be desirable to use
‘the median inorder to avoid deciding whether a gross error is present, tha s,s
ing the Q test Ithas been shown that for thee observations from a normal popu
lation, the median is beter than the mean of the best two out of three (the 680
losest) values|A15 STATISTICS FOR SMALL DATA SETS
101
Table 34
Effcincios and Conversion Factors for 2 to 10 Observations”
Range Confidence
= Range pcr)
No.of a Deviation Factor)
Oberaions OF Median, Bs Orange Facog Taal Py
2 10 “00 0. oa 36
3 om ding 09 3 301
4 om se oie on 1
5 oe 96 on ast ois
‘ om a O40 a0 oss
7 ost 91 om a3 0st
: on 099 03s 029 oa
3 oss as ou 036 a7
0 on ass on 02 033
a os ‘00 0 200 ‘00
“Ady tom RB Des 1 Bison, Aral Ch, 2 (51 6,
RANGE INSTEAD OF THE STANDARD DEVIATION.
‘The range R for a small se of measurements, is highly efficient for describing the
spread of results. Te efficiency of the range, E, shown in Table 34, is vstally
ential to that ofthe standard deviation forfour or fewer measurements This high
relative eicieny arses from the fact thatthe standard deviation is a poor estimate
‘of the spread fora small mimber of observations, although tis stl th best known,
‘estimate for a given set of data. To convert the range to &measute of spread tha is
independent of the number of observations, we must multiply it by the deviation
‘uetor,K, given in Table 34. This factor adjusts the range so that on average it
‘reflects te standard deviation of the population, which we represent by 5
5 Ry ean
In Example 3.9 the standard deviation ofthe four weighs is 0.69 mg. The range is
1.6 mg. Multiplying by Ke for fou observations, = 1.6 mg X 0.49 = 0.78 mg. AS
WN increases, the efficiency of the range decreases relative to the standard deviation,
‘The median M may be used in computing the standard deviation, in order to
‘minimize the influence of extraneous values. Taking Example 3.9 again the stan-
ard deviation caleulated using the median, 9.8, in place ofthe mean in Equation
32, is 0.73 mg, instead of 0.69 mg,
CONFIDENCE LIMITS USING THE RANGE
‘Confidence limits coud be ealeulated using 5, obtained from the ange, in place of,
in Equation 39, and a corresponding but different able. It is more convenient,
‘though, to calculate the limits directly from the range as
Confidence limit == Re, G9)
‘The factor for converting R tos, has been included in he quantity, which is tab-
‘lated in Table 34 for 99 and 95% confidence levels. The calculated confidence
limit st the 959 confidence level in Example 3.15 using Equation 3.18 is 93.50.
0.19 (13) = 9386 0.25% Na,CO,,
“The rage is. as good « measue
ofthe ead of tesul si the
Sadard deviation forfour ores
&
Ns
\
ort Ktrtrnan
‘METU LIBRARY02
“ifn araight Hin A equed,
cain only to eta points
—Anonyious
Fig 37. seating pt
[DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY.
416. Linear Least Squares—tow fo Plot
the Hight Straight Line
‘The analyst is fequeatly coaftonted wit pling data that fll ona straight line,
18 in an analytical calibration curve. Graphing, that i, curve fing, is critically
important in obtaining accorateanalytieal data Ie is the calibration graph that is
used to calculate the unknown coacenttation. Straight-line predictability and con-
sistency will determine the accuracy of the unknown calculation. All measurements
will have a degree of uncersinty, and so will the ploted straight line. Graphing
is often done intuitively, that is, by simply “eyeballing” the best straight line by
placing aruler through the point, which invariably have some scatter. A better ap-
proach sto apply sttstis to define the most probable straight-line fof the data,
‘The availabilty of statistical fonctions in spreadsheets today make it staghtfor
waed o prepare straight-line, or even nonlinea, fits. We will frst lear the com-
tations that ae involved in curve fing and satsical evaluation.
‘fa straight-line relationship i assumed, then the data ft the equation
ms +b a9)
here isthe dependent variable, x isthe independent variable, m is the slope,
ofthe curve, and b is the itercept on the ordinate (y axis» is usually the mea
sured variable, plowed as a function of changing x (see Figure 3.7) Ina spec-
trophotometrc calbration curve, y Would represent the measured absorbances nd
_x would be the concentrations ofthe standards. Our problem, then, is to establish
‘als for m and b.
LEAST-SQUARES PLOTS.
[I can be shown statistically thatthe best strsght line Urough a series of exper-
mental points is that line for which the sum ofthe square of the deviations (ihe
residuals) of he points from the line i mininwon. This is known asthe method of
least squares. If isthe fixed variable (eg, concentration) andy isthe measured
variable (absorbance in aspectrophotometric measurement, the peak area in chro-
matographie measurement, et.), thea the devition of y vecticlly from the line a
given value of (x) is of interest. If y, isthe value onthe line, it i equal to
‘mx; -+.b. The square of the sum of the differences, S, is then
Dow = Ele me + OF 2m,
‘This equation assumes no error in x, the independent variable,| 216 LINEAR LEAST SQUARES—-HOW TO PLOT THE RIGHT STRAIGHT LINE 03
‘The best straight line occurs when $ goes through a minimum. This is ob- The least squares slope an ites
tained by use of differential calculus by setting the derivatives of $ with respect to define the most probable sight
am and b equal to zero and solving for m and b. The results Tine
nw = DO-D,
Ss
2
2)
where Z isthe mean of al the values of x and 9 i ube mean of all the values
fy. The use of diferences in celeulations is cumbersome, and Equation 3.21 ean
be transformed into an easier to use form, especialy if a calculator i avilable:
Sap = EDI
Se= (xin) ap
| where m isthe number of datapoints
ie Example 3:21
Riboflavin (vitamin B,) is determined in a cereal sample by measuring its uo-
rescence intensity in 5% acetc acd solution. A calibration curve was prepared
bby measuring the Guorescence intensities of a series of stndards of increasing
‘concentrations. The following data were obtained. Use the method of least squares
to obtain the best strait line forthe calibration curve aad to calculate the con-
‘entration of riboflavin inthe sample solution. The sample fuorescence intensity
ras 154
Fluorescence
Riboflavin, Intensity Arbitrary
gil (x) Unite (yd a xy,
0.000 00 ‘0.0000 0.00
0.100 58 00100 058
0.200 122 0.0400 2a
0.400, 23 0.160, 3a
246,
San = 465,
167;
Solution
Using Equations 3.23 and 3.22,
46.54 ~ (1.500 * 83.695)
| me 0850, = 2.2505
16.72 ~ (53.75 X 0.300) = 06, for units
53.7 luo. unitsfppm104
‘The standard deviations of mand b
_gve an equation from which the un
czcaingy inthe unknown is caleu-
tate, sing propagation of enor.
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
Fuoresceneeintensy
g
te a az oats os a7 aa te
ibowiain, pom
Fig. $8. Leastsquaves plot of data from Example 3.21,
We have retained the maximum number of sigifcant figures in computation, Since
the experimental values of y are obtained to only the first decimal pace, we can
ound m and b to the frst decimal. The equation of the straight line i (FU = Huo-
rescence units; ppm = ug/ml.)
yUFU) = 53.80°U/ppm)x(ppm) + 0.6¢F4)
‘The sample concentration is
184 = 53.8 +06
275 yagi.
‘To prepare an actual plot ofthe line, ake two arbitrary values of x sulicenly far
apart and calculate the comesponding y values (or vice vest) and use these as
point to draw the line. The intercept y = 0.6 (atx = 0) could be sed as one pont.
‘AC 0.500 palm, y= 27'S. A plot ofthe experimental data and the least-squares
tine drawa drough them is shown in Figure 3.8, This was ploaad using Excel, with
the equation of the line and the square ofthe corclation coefficient (a measure of
agreement between the two variables—ignore tis for now, we wl discus it ate).
‘The program automatically gives additional figures, but note the agreement with
our calculated values for the slope and intercept.
‘STANDARD DEVIATIONS OF THE SLOPE AND
INTERCEPT—THEY DETERMINE THE UNKNOWN UNCERTAINTY
ach data point onthe least-squares line exhibits 2 normal (Gaussian) distribution
bout the line on the y axis. The deviation ofeach y, fom the lin is; ~ 91 =
(one +B), a8 in Equation 320. The standard deviation of each of these y-axis22:16 LINEAR LEAST SQUARES—HOW TO PLOT THE RIGHT STRAIGHT LINE
deviations is given by an equation analogous to Equation 3.2 except that here are 180
Tess degrees of freedom since two are used in defining the slope andthe intercept
B= Gyn wise ym
a G29,
“This quanti is also called the standard deviation of repression, sr. The value can
‘be used wo obtain uncertainties forthe slope, m, and intercept, b of the least-squares
lie since they ae related to the uncertainty in each value of y. For the lope:
[== estar feta a VS am ae
isthe mean of all x values. Forth intercept
saf 1
'Y Nis Gay Y N= Oxyd ad
In caleulating an unknown concentration, x, from Equation 3.19, represeating
the Teast-squaes line, the uncertainties in ym, and b are all propagated in the
sual manner, from Which we can determine the uncertainty in the unknown
where
Bae 402
Estimate the uncertainty in the slope, intercept, end y for the least-squares plot in
Example 3.21, andthe uncersiny in the determined ribofavin concentration.
Solution
In order to solve forall the uncertainties, we need values for Dy2, (Zyp% Ix’,
Gay, nd wt. From Example 321, Gy) = 5.6 = 69090; Eat = 085
2.250, and m? = (53:7,)>= 288 The (>,)' values are (0.07, (3.8),
(adh, G23y, and (BSN =n 386, HES. and TAS. and 5
2554.6 (carrying exta figures). From Equation 3.24,
From Equation 325,
O5y
‘0.850, = 22505
0.850,
570.850)
From Equation 3.26,
06,
‘Therefore, m= 53y++1y and b= 06 + 0.4,106
‘A. comlaion coefficient Aer 1
mca there i diet relationship
‘tween two vaiables, ab
sovbance and concent
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
‘Tho unknown riboflavin concentration is calculate from
G+5)- G25) _ U54+06-6+04)
ra S3a= ly eS
Applying the principles of propagation of eror (absolute variances in numerator
Additive, relative variances inthe division sep additive), we calculate that x =
027,001, ppm.
‘See Chapter 16 for the spreadsheet calculation of the standard deviation of re-
‘reason and the standard deviation ofan unknown for this,
‘417 Correlation Coefficient and Coefficient of Determination
“The correlation coefficient is used es measure of the correlation between £WO
variables. When variables x and y are correlated rather than being functionally =
lated (ic. are not dretly dependent upon one another), we do not speak of the
“pest” y Yalue coresponding toe given x value, but only of the most “probable
value, The closer the observed values are to the most probable values the more
0.99 indicates excellent linearity. AR r>
(0.999 can sometimes be obssined with eare.
‘The correlation coefficient gives the dependent and independent variables
equal weight, which is usually aot ue in scientific measurements. The r value
tends to give more confidence in the goodness of fi than warranted. The fit must
‘be quite poor before r becomes smaller than about 0.98 and is really very poor
‘when less than 0.9.
'A more conservative measure of closenest of itis the square ofthe correla:
‘ion coefficient, r, and this i what most statsteal programs calculate (inclading
Exoel—see Figure 3.8). An r vlue of 0.80 coresponds to an valve of only 0.81,
‘while an of 0.95 is equivalent to an r of 0.90. The goodness offi is judged by
‘the numberof 9's. So three 9's (0.999) or better represents an exellent ft. We wil
tse 7 as a measure of ft. This is also called the coefficient of determination.
It should be mentioned that itis possible to have a high degree of corela-
tion between two methods (r* near unity) bot to have a statistically significant dif-
ference between the results of each according tothe £ test, This Would oceu, for
example, if there were 2 constant determinate eror in one method. This would
make the differences significant (nidt due to chance), but there would be a direct
comelation between the results {r? would be near unity, bat the slope (mi) may not
bbe near unity orth intercept (2) not near zero]. In principle, an empirical corec-
tion factor (a constant) could be applied to make the results by each method the
same over the concentration range analyzed,
5.18 Using Spreadsheets for Plotting Calibration Curves
‘The availability of spreadsheets makes it unecessary to plot data on graph paper
and do hand calculations forthe least-squares regression aualysis and statistics.
‘We will use the dats in Example 3.21 to prepare the plot shown in Figure 3.8, us-
ing Excel
‘Open anew spreadsheet and enter:
(Cell AL: Riboflavin, ppm (adjust the column wid to incorporate the text)
(Cel B 1: Fluorescence intensity
107
‘Te colicin of determination (2)
in a beter measure off08
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
Ce A3: 0.000
Cell Ad: 0.100
‘Call AS: 0.200
Call A6: 0.400
‘Cell a7: 0.800
Cell B3: 00
Cell BS: 58
Cell BS: 12.2
Cell B6: 22.3
coll BI: 43.3
Format th cell numbers to ave three decimal places for column A and one for
column B.
‘Click onthe Chart Wizard icon onthe tolbar (the one with the vertical bars).
‘Step 1—ChartType—of te Chart Wizard will appear.
Follow the folowing sequences:
Select XY (scatter) and Seater (no line) for Chart subtype
Next
Data Range: enter A3:B7 (click on Serie, and note the X values and ¥ val-
‘es adresses)
‘Check: Columns (after going buck to Data Range)
Next
(Char tite: enter Calibration Curve32.18 USING SPREADSHEETS FOR PLOTTING CALIBRATION CURVES
‘Value (0) axis: enter Riboflavin
‘Value (Y) axis: enter Fluorescence intensity
(Gridlines: uncheck Major gridines
Legend: Delete Show legend
‘Data labels: None (Try Show Value, and note the data entered on each point
on the Line)
Next
(Click on As New sheet: Chart 1
Finish
‘The calibration graph is ploted on « newy Excel sheet.
"Now we wish to enter te least-squares equation line and the»? value. Click
fn the figure, and Chart will appear in the toolbar. Click on it and continu:
‘Add Trendline
Linear
Options
Display equation on chart
Display R-squared value on chart
OK
[Now look atthe char. Click on it to remove the end markers. Yoo ean move the
‘uation on the line toward the left and enlage it. Click on the equation cis high-
Tighted with small squaes. Click ona comer and drag ito the let, down the line.
"You can increase the font size. Clik on Forma: Select Data LablesFont. Select ize
14 then OK. Drag the equation close othe lin. You canals increase the font size
ofthe ans labels by highlighting them and doing the same, as well s the tide.
Let's get rid of the gray background. Click on the gray area, then Format:
Select Plot Area, Click onthe white color square, then OK. The chart you have
‘ow prepared should look similar to Figure 38.
‘When you prepare te graph, you can inially highligh the cells (A3:B7) that
xyou want to graph, andthe adresses will aatomatcallybe placed in the Data Range.
Instead of placing the graph ona new sheet, you could have selected AS objet in
‘Sheet I. Tis would have placed iti the spreadsheet in which you entered the data
‘You can adjust its poston and size by clicking oni, and dragging the corer. Fig
ure 39 shows the graph inserted into the spreadsheet. Try doing this. Once you have
the graph inserted inthe speadsheet, this heoomes a generic plot for nw data, that
is, you change the data in columns A and B, a new line is automatically charted.
‘Ty this, (You should save your original spreadsheet raph and rename the new one.)
‘You may print omly the graph by fst licking oni highlight it
4.19 Slope, Intercept, and Coefficient of Determination
‘We can use the Excel statistical functions to calculate the slope and intercept for a
series of data, and the R? value, without a plot. Open a new spreadsheet and enter
the ealibration data from Example 3.21, as in Figue 3.9, in cells ABT. In cell AD
{ype Intercept, in cell A10, Slope, and in cell ALL, R® lghlight cell B9, click on
f-Statisial, sd scroll dow to INTERCEPT under Fonction name, and click OK.
For Known_x's, enter the aay A3:A7, and for Known_y's, enter BSB7. Click
1090
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
FA TET TRE
ss
—, “sof es
= ca | aay] 8 ai q
2 5
a ast} gi
— 22H F28) sasoravoms
z fe) cen i.
2 8 oso
aj TTB a0
a H
‘ao00, 200. 0400 600 0800. 1.000
‘iboiavn, pom
Fig, 39. catiration graph inserted in spreadsheet (Sheet 1).
(OK, andthe intercept is displayed in cell B9. Now repeat, highlighting cell BIO,
scrolling to Slope, and entering the same arays. The slope appears in cell B10. Re-
peat again, highlighting cell B11, and scoling to RSQ. R? appear in cell B11.
(Compare wid the values in Figure 3.9.
3.20. WINEST for Additional Statistics
‘The LINEST program of Exce! allows us to quickly obttin several statistical func
tions fora set of data, in particular, the slope aod its standard deviation, the iatr-
‘cept and its standard deviation, te coefficient of determination, and the standard
ceror of the estimate, besides otbers we will not discuss nov. Linst will auton
‘cally calculate a total of 10 funetions in 2 columns of the spresdshet.
Open a new spreadsheet, and enter the calibration data from Example 3.21
as you did above, in cells A3:BT, Refer to Figure 3.10. The statistical data willbe
placed in 10 cells, so let's label them now. We will place them in cells B9:CI3,
‘Type labels as follows:
Cel AS: slope
(Call 10: std dev,
Cel AN:
(Call A12: F
Coll A13: sum 59. rt
(Cell DS: intercept
(Cell DIO: st. dev,
Cell DIL: std. exor of esti.
Catt D12:
(Cell D13: sum 39 resid.F 13.22. DETECTION LINITS THERE IS NO SUCH THING AS ZERO.
Sa Sl A
7 tes |oves ae
——— |
nat — |
— |
— |
= =|
ee oe
rca Gewese] ocueau oar dea —|
ee ee
Highlight ces B9:C13, and click on f,. From the Statistical function, seroll down
to LINEST and click’ OK. For Known_y’s, enter the ary B3:B7, and for
Kaovn_x's enter A3:A7, Then in each ofthe bores lnbeled Const and Stats, type
“cue”, Now we have to use the keyboard to execute the calculations. Depress Shift,
(Control, and Enter, and release. The statistical data are entered into the highlighted
calls. This keystroke combination must be used whenever performing a function
‘on an array of cells like here. The slop is in cell B9 and its standard deviation in
‘all B10. The intercept isin cell C9 and its standard deviation in cell CLO. The co-
ficient of determination isin cell BL. Compare the standard deviations with
those calculated in Example 3.22, andthe slope, intercept, and R? with Example
3.21 or Figure 3.8.
‘Cell C11 contains the standard error of the estimate (or standard deviation of
the regression) and is measure ofthe eror in estimating values of y. The smallor|
itis, the closer the numbers are tothe line, The othe cells contain data we will
‘not consider hee: Cell BI? is the F value, cell C12 the degrees of freedom (used
for F), cell B13 the sum of squares of the ogresion, and cell C13 the sum of
‘squares ofthe residuals
ow many significant Ggures should we kecp for the least-squares line? The
standard deviations give us the answer. The slope has a standard deviation of 1.0,
‘and 20 we write the slope as $3.8 1. at best. Te intercept standard deviation is
“40.42, x0 forthe slope we write 0.6 = 0.4, See also Example 3.22.
421. Statistics Software Packages
[Excel offers a numberof statistical functions, listed under the Tools menu. Go to
‘Add-Ins, and check Analysis ToolPa. Click OK and return to the spreadsheet. Now
‘when you goto the Tools menu, you will see Data Analysis. Go to that, and you
will se 19 statistical programs listed. As you experiment with these, you wil find
some very useful. One Add-In thats very useful is Solver, fr solving complicated
formulas. Its use is described in Chapter 6, Soe also the text website warw.viley.
‘comicalleg/chrstian for © list of some commercial software packages for per
forming basic as well as more advanced statistical calculations.
422 Detection Limits—There ls Mo Such Thing as Zero
‘The previous discussions have dealt with statistical methods to estimate the relia.
bility of analyses at specific confidence levels, these being ultimately determined,
m
fig SIL usgunestee2
‘The concentration that gives a signal
quel 0 tee times the anda de
‘ton ofthe backaround i gone
ally akon as the detection iit
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
fi UL Peavey noe te ts fr deen it The backroom!
toes repent cnnmnly erat bsgows Sale wr Be le meen
repel te peg Arb ame ipa wd 1 sone
‘eran awe daceh to emnge fe ete Rest,
by the pecision of the method. Al instrumental methods have a degree of noise
associated with the measurement that limits the amount of analyte that canbe de-
tected. The nose is reflected inthe precision of the blank or background signal,
and noise may be apparent even when there is no significant blank signal. Tis
smay be due to ectvation inthe dark current of a photomultiplier tube, ame Ricker
Jn an atomie absorption instrument, and other factors
‘The limit of detection is the lowest concentration level that canbe determined
to be statistically different from an analyte Bank. There are numerous ways that
‘desocton limits have been defined. For example, the concentration tat gives twice
the peak-to-peak noise ofa series of background signal measurements (r of .6on-
tinuously recorded background signal) may be taken asthe detection limit (see ig-
ture 3.11). A generally avcepted detection limit isthe concentration that gives a
signal thre times the standard deviation ofthe background signal
OB amie six
‘A series of sequential baseline absorbance measurements ate made in spectro
‘photometric method, for determining the purity of aspirin i tablets using blank
Solution. The absorbance readings are 0.002, 0.000, 0.008, 0.006, and 0.003. A.
standard 1 ppm asprin solution gives an absorbance reading of 0.051. What isthe
detection Lait?
Solution
‘The standard deviation ofthe blank readings is 0.0032 absorbance units, and the
‘mean ofthe Blank readings is 0.004 absorbance unis. The detection limit is that
‘concentration of analyte that gives a reading of 3 x 0.0032 = 0.0096 absorbance:
‘wading, above the blank signal. The net reading forthe standards 0081 ~ 0.004 =
(0.047. The detection limit would correspond to 1 ppm (.008610.087) = 02 ppm
and would give a total absorbance reading of 0.0096 + 0.004
‘The precision atthe detection imi is by definition about 334%, For quamatve
smeaiuremens,conceniaons should be at last 10 tines the detection lint
(@ ppm in the above example){325 STATISTICS OF SAMPLING—HOW MANY SAMBLES, HOW LARGE?
There have been various ausmpts to place the concept of detection limit on
4 more firm statistical ground. The International Conference on Harmonization
(OCH; see Chapter 4) of Technical Requirements for Registration of Pharmaceuti-
cals for Human Use has proposed guidelines for analytical method validation (Ref,
18) The ICH Q2B guideline on validation methodology suggests calculation based
‘a the standard deviation, s, of the response and the slope or sensitivity, S, ofthe
‘eaitation euve at levels approaching the limit. For the init of detection (LOD),
Lop = 3.3005) 629)
And for limit of quantitation (LOQ)
HS aS) 630)
‘The standard deviation ofthe response can be determined based on the standard
deviation of either the blank, the residual standard deviation of the least-squares
regression line, or the stndsrd deviation of they inereep of the repression line.
‘The Excel statistical funtion canbe used to abiin the lst two
‘The International Union of Pare and Applied Chemistry TUPAC) uses a value
‘of 3 in Equation 3.29 (for blane measurements), derived from a confidence level
(of 95% for a reasonable number of measurements. The cnfidence level, of couse,
varies with the number of measurements, and 7 to 10 measurements should be
‘taken. The botiom line is that one should regard a detection limit as an approxi
‘mate guise to performance and not make efforts to determine it to precisely.
4.23. Statistics of Sampling—tow Hany Samples, How Large?
“The sequiring of a vali analytical sample is perhaps the most eitical part of any
analysis. The physical sampling of diferent types of materials (solids, liquids,
‘gses) i discussed in Chapter 2. We describe here some of the statistical consid
erations in sampling,
‘THE PRECISION OF A RESULT—SAMPLING IS THE KEY
‘More often than not, the accuracy and precision of an analysis is limited by the
‘sampling rather tan the measurement step. The overal variance of an analysis is
‘the sum ofthe sampling variance andthe variance ofthe remaining, analytical op-
erations, that is,
godes? ean
Ifthe vaiance due to sampling is known (by having performed multiple sam-
pings of the material of interest and anslyzing icusing a precise measurement ec
higue), then there is litle to be geined by reduction of g, to less than is, For
‘extmple, ifthe absolute standard deviation for sampling is 3.09% and tht ofthe
analysis is 1.0%, then s3 = (L0¥ + @.0)' = 100, ors, = 3.2%. Here, 94% ofthe
imprecision is due to sampling and only 6% is due to measurement (i increased
from 30 to 3.2%, s0 0.2% is dve to the measorement) I the sampling impreci-
sion is relatively lage, i is beter to use @ rapid, lower precision method and an-
alyze more samples
3
Line is gained by improving he
oles variance Fess tan 0c
‘hid te sampling variance is bet
‘ero analyze more samples using &
fae, es precise method14
The greater the sample size, the
smaller the varitce
[DATA HANOLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
‘We are really interested in the value and vasiance ofthe tue valve. The to:
tal variance is si = 62 +92 + 52, where 3 describes the “tue” variability ofthe
analyte inthe system, the value of which isthe goal ofthe analysis. For relsble
Interpretation ofthe chemical analysis, the combined sampling and analytical var-
lance should not exceed 20% of the (otal variance, (See M. H. Ramsey, “Appro-
pate Precision: Matching Analytical Precision Specifiations to the Particular
‘Appliction” Anal. Proc, 30 (1993) 110.)
‘THE “TRUE VALUE”
‘The range in which the tre valve falls forthe analyte comtent in bile material
can be estimated from ar test ata given confidence level (Equation 3.11) Here,
+s the average ofthe analytical results forthe parcular material analyzed, and
1s the standard deviation thats obtained previously from analysis of similar ma-
‘eral samples o¢ from the present analysis if there are sufficient samples.
(MINIMUM SAMPLE SIZE
Statistical guidelines have been developed forthe proper sampling of beteroge-
‘neous materials, based onthe sampling variance. The minimum size of individual
Increments for a well-mixed population of differen kinds of particles can be est-
‘mated from Ingamell’s sampling constant, K,
wR = K, 32)
where w isthe weight of sample analyzed and R isthe perent relative standard
‘deviation of the sample composition. Ky represents the weight of sample for 1%
sampling uncertainty ata 68% confidence evel and is obtained by determining the
‘standard deviation from the measurement ofa series of samples of weight w. This
‘equation, in effect, says thatthe sampling variance is inversely proportional to the
sample weight
MB ames
“Ingamell's sampling constant forthe analysis ofthe nitrogen content of wheat sam=
‘ples is 0.50 g. What weight sample shouldbe taken to obtain a sampling precision
of 0.2% red in the analysis?
Solution
w(02)'= 0505
25g
[Note thatthe entre sample is not likely to be analyzed. The 125-g gross sample
willbe finely ground, ada few hundred milligrams of tbe homogeneous material
analyzed. Ifthe sample were not made homogeneous, then the bulk of it would
have to be analyze.{328 STATISTICS OF SAMPLING—HOW MANY SAMPLES, HOW LARGE?
5
(MINIMUM NUMBER OF SAMPLES
“The number of individual sample increments needed to achieve given level of,
confidence in the analytical results i estimated by
ma
a= 633)
‘where 1s the Student value forthe confidence level desired, s isthe sampling
‘variance, ris the acceptable relative standard deviation of the average of the ana
Iya results, x 5, the absolute standard deviation, in the same units as 7, and
0/m i unites. Values of s, and are obtained from preliminary measurements ot
rior knowledge. Since rs equal to ,f, ve can wit that
suo ens i tae nen,
i rs a Se tt,
‘value for the given confidence level is initially estimated and an iterative proce-
ce is used to calculate
DB sae 325
‘The iron content in a bleaded lot of bulk ore material is about 5% (wt/wt), and the
relative standard deviation of sampling, sis 0.021 (2.1% rsd). How many sam-
ples shouldbe taken in onder to obtain a relative standard deviation, r, of 0.016
(1.6% ro) in the results atthe 95% confidence level i.e, the standard deviation,
4 for the Sion ennten is 0.08% (wUwO)?
oa)
Solution
‘We can use either Equation (3.33) or 3.34). We will use the latter, Sets = 1.96
(for n= ©, Table 3.1) at the 95% confidence level. Calculate « proliminary value
‘of m. Then use this n to select-a closer value, and recalculate; continue itera-
tion to constant
_ 2.967(0.0217
Woy
For n
= 2365,
_ 23650021
(0167
2s oon
(015
=o
‘See if you got the same result using Equation (3.33)16
Learning Objectives
Questions
DATA HANDLING AND SPREADSHEETS IN ANALYTICAL CHEMISTRY
Equation 3.33 holds for a Gaussian distribution of analyte concentration
within che bulk materia, that is, it will be centered around x with 689% of the val-
ves falling within one standard deviation, or 95% within two standard deviations.
In this case, the variance of the population, ois small eommpared to the true value.
If the concentration follows a Poisson distribution, that is, follows a random dis-
‘mbution in the bulk material such tht the tue o mesh value # approximates the
variance, s, ofthe population, then Equation 333 is somewhat simplified:
f4i.£ 3)
Note hat since sis equ othe right band par of the expesin eames eal
to, bt the nits do ne cance In this css hen te onsen stbton
‘stron sir than namow, many moe simples ar egued to get apse
tive renal from the anaes
the anal ocr in clumps or paths, the smpling satey becomes
‘more complicated. The patches can be considered as separate strata and sampled
Separate If tlk matcal ar epregated rsd, ad the average composi
tion is dsc, ten the namber of samples fom each sum should bein pro
Porton oo sizeof he stim
WHAT ARE SOME OF THE KEY THINGS WE LEARNED
FROM THIS CHAPTER?
1 Accuracy and precision, p. 65
© Types of exors in measurements, p. 68
« Significant figures in measurements and calculations, pp. 6, 67
Standard deviation, p. 74
1 How t0 use spreadsheets, p. 78
«Propagation of exors, p. 82
© Control chars, p. 89
© Statistics: confidence limits, rests, F tests, p. 90
Rejection ofa result, p. 98
© Least squares plots and coefficient of determination, pp. 102, 106
‘© Using spreadsheets for plotting calibration euves, p. 107
‘© Detection limits, p. 11
‘© Statistics of sampling, p. 113,
1. Distinguish between accuracy and provision.
2. What i determinate err? An indeterminate ex?
‘3. The following is alist of common errors encountered in research laboratories.
(Categorize each asa determingte or an indeterminate error, and further cat
gorize determinate exors as inrumental, operative, or methodic: (8) An un-
‘known being weighed is hygroscopic. (b) One component of a mixture beingPROBLEMS.
analyzed quantitatively by gas chromatography teacts with the column pock=
ing (€) A radioactive sample being counted repeatedly without any change in
‘conditions yields a slight different count at each tial.) The tip of te pipet
used inthe analysis is broken. (¢) In measuring the same peak heights of a
chromatogram, two technicians each report diferent heights.
For the statistical problems, do the calulations manually frst and then use the
[Excel statistical functions and se if you get the same answers. See the CD, Prob
Jems 14-18, 20, 21, 25-20, and 37-40.
SIGNIFICANT FIGURES
4. How many significant figues does each of the following numbers bave?
(4) 200.06, ¢b) 6.030 * 10°, and (€) 7.80 % 108,
'5. How many significant figures does each of the following numbers have?
(@) 0.02670, &) 328.0, (€) 70000, and (3) 0.00200.
6. Calculate the formula weight of LINO, to the corect numberof significant
figures.
7. Calculate the formula weight of PACI, t the comeet number of significant
figures.
28, Give the answer to the following problem tothe maximum numberof signif-
feant figures: 50.00 % 27.8 X 0.1167
9. Give the answer of the following to the maximum number of significant ig-
tres: (2.776 X 0.0050) ~ (6.3 * 107) + (0.036 X 0.0271),
10, An analyst wishes to analyze spectrophotometrcally the copper content in a
‘bronze sample I the sample weighs about 5 and ifthe absorbance (A) is to
be read tothe nearest 0.001 absorbance unit, how accurately should dhe sam-
‘ple be weighed? Assume the volume of the measured solution will be adjusted
to obtain minimam eror in the absorbance, that is, so that O.1