Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
46 views38 pages

Errors 2

This document discusses sources of error in biological experiments. It begins by distinguishing between systematic errors, which can be eliminated, and random errors, which cannot be eliminated but only estimated. Some common sources of random error discussed include reading errors, counting errors, sampling errors, and biological and technical variability. The document provides examples of how different types of errors contribute to measurement uncertainty. It also introduces concepts such as statistical estimators and how measurements from a sample can be used to infer properties of the underlying population.

Uploaded by

Shruthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views38 pages

Errors 2

This document discusses sources of error in biological experiments. It begins by distinguishing between systematic errors, which can be eliminated, and random errors, which cannot be eliminated but only estimated. Some common sources of random error discussed include reading errors, counting errors, sampling errors, and biological and technical variability. The document provides examples of how different types of errors contribute to measurement uncertainty. It also introduces concepts such as statistical estimators and how measurements from a sample can be used to infer properties of the underlying population.

Uploaded by

Shruthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Error analysis in biology

Marek Gierliński
Division of Computational Biology

Hand-outs available at http://is.gd/statlec

Errors, like straws, upon the surface flow;


He who would search for pearls must dive below
John Dryden (1631-1700)
Previously on Errors…
• Random variable: result of an experiment
• Probability distribution: how random values are
distributed
• Discrete and continuous probability distributions

Poisson (count) distribution Binomial distribution


Gaussian (normal) distribution • random and independent • probability of 𝑘 successes
• very common events out of 𝑛 trials
• 95% probability within 𝜇 ± 1.96𝜎 • mean = variance • toss a coin
• approximates Gaussian for • approximates Gaussian
large 𝑛 for large 𝑛
2
Example
 Take one mouse and weight it
 Result: 18.21 g
 Reading error

 Take five mice and find mean weight


 Results 18.81 g
 Sampling error

 These are examples of measurement


errors

3
2. Measurement errors

“If your experiment needs statistics, you ought to have


done a better experiment”

Ernest Rutherford
Different types of errors
Systematic errors Random errors
 Incorrect instrument calibration  Reading errors
 Model uncertainties  Sampling errors
 Change in experimental conditions  Counting errors
 Mistakes!  Background noise
 Intrinsic variability
 Sensitivity limits
Systematic errors can be eliminated in good You can’t eliminate random errors, you have to
experiments live with them. You can estimate (and reduce)
random error by taking multiple measurements

5
Random measurement error
 Determine the strength of oxalic acid in a sample
 Method: find the volume of NaOH solution required to neutralize a given volume
of the acid by observing a phenolphthalein indicator
 Uncertainties contributing to the final result
 volume of the acid sample
 judgement at which point acid is neutralized

 volume of NaOH solution used at this point

 accuracy of NaOH concentration

• weight of solid NaOH dissolved


• volume of water added
 Each of these uncertainties adds a random error to the final result

6
A model of random measurement error
 Laplace 1783

 Consider a measurement of a certain


quantity
 Its unknown true value is 𝑚0

Contribution
 Measurement is perturbed by small
uncertainties
 Each of them contributes a small random
deviation, ±𝜀, from the measured value

7
A model of random measurement error
 Laplace 1783

 Consider a measurement of a certain


quantity
 Its unknown true value is 𝑚0

Contribution
 Measurement is perturbed by small
uncertainties
 Each of them contributes a small random
deviation, ±𝜀, from the measured value

 This creates binomial distribution


 For large 𝑛 it approximates Gaussian
Binomial
distribution
 We expect random measurement errors
to be normally distributed

8
Biological and technical variability
Biological variability Technical variability
 Molecular level  Random measurement errors
 Phenotype variability  Accumulation of errors
 From subject to subject
 Variability in time
 Life is stochastic!

 In most experiments biological variability dominates


 It is hard to disentangle the two types of variability

10
Sampling error
 Repeated measurements give us
 mean value
 variability scale

 Sampling from a population


 Measure the body weight of a mouse
 Sample: 5 mice

 Population: all mice on the planet Body weight of 5 mice Mean


(g) (g)
 Small sample size introduces 20.38 20.73 23.24 15.39 12.58 18.5
uncertainty 27.48 12.52 21.95 12.54 21.19 19.1
14.73 16.37 28.21 21.18 13.48 18.9

11
Reading error
smallest division
 When you do one simple measurement
using
 ruler
 micrometer
 voltmeter
 thermometer
 measuring cylinder
 stopwatch
 The reading error is half of the smallest
division
 A ruler with 1-mm scale can give a reading
230.5 mm
 Beware of digital instruments that
sometimes give readings much better
than their real accuracy
 Read the instruction manual!
 Reading error does not take into account
biological variability

12
Counting error
 Dilution plating of bacteria

 Counted 𝐶 = 17 colonies on a plate at the


10-5 dilution
 Counting statistics: Poisson distribution
𝜎= 𝜇

 Use standard deviation as error estimate

𝑆 = 𝐶 = 17 ≈ 4

𝐶 = 17 ± 4

13
Counting error
 Gedankenexperiment
 True mean count, 𝜇 = 11

 Measure counts on 10,000 plates (!) 𝐶𝑖


 Plot counts, 𝐶𝑖 , and their errors,
𝑆𝑖 = 𝐶𝑖
𝑆𝑖
 Plot distribution of counts from 10,000
plates and its mean, 𝜇, and standard
deviation, 𝜎

 Counting errors, 𝑆𝑖 = 𝐶𝑖 are similar,


but not identical, to 𝜎
𝜇±𝜎
 𝐶𝑖 is an estimator of 𝜇
 𝑆𝑖 is an estimator of 𝜎

14
Exercise: is Dundee a murder capital of Scotland?
 On 2 October 2013 The Courier published
an article “Dundee is murder capital of
Scotland”
 Data in the article (2012/2013):

City Murders Per 100,000


Dundee 6 4.1
Glasgow 19 3.2
Aberdeen 2 0.88
Edinburgh 2 0.41

 Compare Dundee and Glasgow


 Find errors on murder rates
 Hint: find errors on murder count first

15
Exercise: is Dundee a murder capital of Scotland?
City Murders Per 100,000
Dundee 6 4.1
𝑝 = 0.8
Glasgow 19 3.2

Δ𝐶𝐷 = 6 ≈ 2.4
Δ𝐶𝐺 = 19 ≈ 4.4

 Errors scale with variables, so we can use


fractional errors
Δ𝐶𝐷
= 0.41
𝐶𝐷
Δ𝐶𝐺
= 0.23
𝐷𝐺

 and apply them to murder rate


Δ𝑅𝐷 = 4.1 × 0.41 = 1.7
Δ𝑅𝐺 = 3.2 × 0.23 = 0.74

16
Exercise: is Dundee a murder capital of Scotland?
City Murders Per 100,000 95% confidence intervals
(Lecture 4)
Dundee 6 4.1
p-values from chi-square test
Glasgow 19 3.2 vs Dundee
Aberdeen 2 0.88
Edinburgh 2 0.41

𝑝 = 0.8

𝑝 = 0.04

𝑝 = 0.002

17
Measurement errors: summary
 Experimental random errors are expected to be normally distributed

 Some errors can be estimated directly


 reading (scale, gauge, digital read-out)
 counting

 Other uncertainties require replicates (a sample)


 this introduces sampling error

18
Example
 Body mass of 5 mice
 This is a sample
 We can find
 mean = 18.8 g

 median = 18.6 g

 standard deviation = 5.0 g

 standard error = 2.2 g

 These are examples of statistical


estimators

19
3. Statistical estimators

“The average human has one breast and one testicle”

Des MacHale
Population and sample

Sample selection

 Terms nicked from social sciences


 Most biological experiments involve sample selection
 Terms “population” and “sample” are not always literal

21
What is a sample?
 The term “sample” has different meanings biological samples
in biology and statistics (specimens)

 Biology: sample is a specimen, e.g., a cell


culture you want to analyse
 Experiment in 5 biological replicates
requires 5 biological samples
 After quantification (e.g. protein
abundance) we get a set of 5 numbers

 Statistics: sample is (usually) a set of quantification


numbers (measurements)
 In these talks: 𝑥1 , 𝑥2 , … , 𝑥𝑛 Statistical sample (set of numbers)

1.32 1.12
0.98
0.80 1.07

22
Population and sample
Population Sample
Population can be a somewhat abstract Sample is what you get from your
concept experiments

Huge size, impossible to handle Manageable size, 𝑛 measurements

 all mice on Earth  12 mice in a particular experiment .


 all people with eczema  26 patients with eczema

 all possible measurements of gene  5 biological replicates to measure gene

expression (infinite population) expression

23
Population and sample

Population
unknown parameters A parameter describes a
𝜇, 𝜎, … population

A statistical estimator
(statistic) describes a
sample
Sample
size 𝑛 A statistical estimator
known statistics approximates the
𝑀, 𝑆𝐷, … corresponding parameter

24
Sample size

Dilution plating experiment

What is the sample size?

𝑛=1

This sample consists of one


measurement: 𝑥1 = 17

17 colonies

25
What is a statistical estimator?
Stand at the door of a church on a
Sunday and bid 16 men to stop, tall
ones and small ones, as they happen to
pass out when the service is finished;
then make them put their left feet one
behind the other, and the length thus
obtained shall be a right and lawful
rood to measure and survey the land
with, and the 16th part of it shall be
the right and lawful foot.

Over 400 years ago Köbel:


• introduced random sampling
from a population
• required a representative sample
• defined standardized units of
measure
“Right and lawful rood*” from Geometrei, by Jacob • used 16 replicates to minimize
Köbel (Frankfurt 1575) random error
• calculated an estimator: the
sample mean
*rood – a unit of measure equal to 16 feet

26
Statistical estimators
 Statistical estimator is a sample attribute 𝜇 𝜎
used to estimate a population parameter
𝑀 𝑆𝐷

population
 From a sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 we can find 𝒩(20, 5)
𝑛
1 sample
𝑀= 𝑥𝑖 mean 𝑛 = 30
𝑛
𝑖=1

𝑛
1 2 standard
𝑆𝐷 = 𝑥𝑖 − 𝑀
𝑛−1 deviation
𝑖=1

• 𝑛 = 30
• 𝑀 = 20.3 g
• 𝑆𝐷 = 5.2 g
median, proportion, correlation, …
• 𝑆𝐸 = 0.94 g

𝑀 = 20.3 ± 0.9 g

27
Standard deviation
 Standard deviation is a measure of spread of
data points
Sample mean
 Idea:
 calculate the mean
 find deviations from the mean of individual
points
Deviation from
 get rid of negative signs
the mean
 combine them together

28
Standard deviation
 Standard deviation is a measure of spread of
data points
Sample mean
 Idea:
 calculate the mean
 find deviations from the mean of individual
points
Deviation from
 get rid of negative signs
the mean
 combine them together
 Standard deviation of 𝑥1 , 𝑥2 , … , 𝑥𝑛
1
𝑆𝐷𝑛 = 𝑥𝑖 − 𝑀 2
𝑛
𝑖

1
𝑆𝐷𝑛−1 = 𝑥𝑖 − 𝑀 2 2
𝑆𝐷𝑛−1 is unbiased estimator of variance
𝑛−1
𝑖

 Mean deviation
• doesn’t overestimate outliers
1 • less accurate than 𝑆𝐷
𝑀𝐷 = 𝑥𝑖 − 𝑀
𝑛 • mathematically more complicated
𝑖 • tradition: use 𝑆𝐷

29
Standard error of the mean
 Gedankenexperiment
 Consider a population of mice with
normally distributed body weight with
𝜇 = 20 g and 𝜎 = 5 g

 Take a sample of 5 mice

Sample no.
 Calculate sample mean, 𝑀
 Repeat many times
 Plot distributions of sample means

Normalized frequency
Distribution of
sample means

30
Standard error of the mean
 Gedankenexperiment
 Consider a population of mice with
normally distributed body weight with
𝜇 = 20 g and 𝜎 = 5 g

 Take a sample of 30 mice

Sample no.
 Calculate sample mean, 𝑀
 Repeat many times
 Plot distributions of sample means

Normalized frequency
Distribution of
sample means

31
Standard error of the mean
 Distribution of sample means is called
sampling distribution of the mean
 The larger the sample, the narrower the
sampling distribution

 Sampling distribution is Gaussian, with

Sample no.
standard deviation
𝜎
𝜎𝑚 =
𝑛

 Hence, uncertainty of the mean can be


estimated by
𝑆𝐷
𝑆𝐸 =

Normalized frequency
𝑛

Sampling distribution
 Standard error estimates the width of the of the mean
sampling distribution

32
Standard error of the mean

33
Standard deviation and standard error
Standard deviation Standard error

1 𝑆𝐷
𝑆𝐷 = 𝑥𝑖 − 𝑀 2 𝑆𝐸 =
𝑛−1 𝑛
𝑖

Measure of dispersion in the sample Error of the mean

Estimates the true standard deviation in the Estimates the width (standard deviation) of
population,  the distribution of the sample means
Does not depend on sample size Gets smaller with increasing sample size

34
Correlation coefficient

 Two samples: 𝑥1 , 𝑥2 , … , 𝑥𝑛 and 𝑦1 , 𝑦2 , … , 𝑦𝑛


𝑛 𝑛
1 𝑥𝑖 − 𝑀𝑥 𝑦𝑖 − 𝑀𝑦 1
𝑟= = 𝑍𝑥𝑖 𝑍𝑦𝑖
𝑛−1 𝑆𝐷𝑥 𝑆𝐷𝑦 𝑛−1
𝑖=1 𝑖=1

where 𝑍 is a “Z-score”
 Correlation does not mean causation!

35
Correlation coefficient: example
𝑛
1
𝑥 𝑦 𝑍𝑥 𝑍𝑦 𝑍𝑥 𝑍𝑦 𝑟= 𝑍𝑥𝑖 𝑍𝑦𝑖
𝑛−1
0.01 0.01 -1.35 -1.24 1.68 𝑖=1
0.24 0.22 -0.64 -0.74 0.48
0.25 0.26 -0.62 -0.64 0.40
0.66 0.75 0.62 0.53 0.33
0.75 0.98 0.89 1.09 0.97
0.81 0.95 1.10 1.02 1.11 𝑍𝑥 𝑍𝑦 = 4.96

𝑥 𝑦 𝑍𝑥 𝑍𝑦 𝑍𝑥 𝑍𝑦
0.45 0.74 -1.72 0.57 -0.98
0.60 0.19 -0.54 -0.72 0.39
0.68 0.00 0.05 -1.14 -0.06
0.73 0.98 0.47 1.14 0.54
0.77 0.15 0.77 -0.81 -0.63
0.80 0.90 0.96 0.95 0.92 𝑍𝑥 𝑍𝑦 = 0.18

36
Statistical estimators

Central point Dispersion


Mean Variance
Geometric mean Standard deviation
Harmonic mean Mean deviation
Median Range
Mode Interquartile range
Trimmed mean Mean difference

Symmetry Dependence
Skewness Pearson’s correlation
Kurtosis Rank correlation
Distance

37
Hand-outs available at http://is.gd/statlec

Please leave your feedback forms on the table by the door

You might also like