MECHANICAL
ENGINEERING SYSTEMS
LABORATORY
Group 02
Asst. Prof. Dr. E. İlhan KONUKSEVEN
STATISTICAL TREATMENT OF
EXPERIMENTAL DATA
DISCRETE FREQUENCY DISTRIBUTIONS
Assume that a total of n=10 measurements, xi (i=1,…,10)
are made as:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
14 16 13 19 18 14 14 15 18 15
Note that the span of measurements is 6, ranging from
13 to 19.
FREQUENCY F( nj )
IS THE NUMBER OF OCCURRENCE OF
THE jth MEASUREMENT VALUE
In this example, frequencies are:
j 1 2 3 4 5 6 7
value 13 14 15 16 17 18 19
nj 1 3 2 1 0 2 1
RELATIVE FREQUENCY fj
IS THE RELATIVE VALUES OF NUMBER OF OCCURRENCES
WITH RESPECT TO TOTAL NUMBER OF OCCURRENCES
nj m
f
m
fj j 1 & nnj
n j1 j1
THERE ARE 7 GROUPS ie m = 7
j 1 2 3 4 5 6 7
value 13 14 15 16 17 18 19
fj 0.1 0.3 0.2 0.1 0.0 0.2 0.1
j 1 2 3 4 5 6 7
value 13 14 15 16 17 18 19
nj 1 3 2 1 0 2 1
Frequency Graph: These measurements may be shown
graphically on a histogram called “Frequency Graph”
as follows:
Frequency Relative
nj Frequency
4 0.4
fj
3 0.3
x7
2 0.2
x6 x10 x9
1 0.1
x3 x1 x8 x2 x5 x4
0 0.0
13 14 15 16 17 18 19
MEASURES OF CENTRAL TENDENCY
x
ARITHMETIC MEAN (Average)
n
1
x
n
i1
xi
IT PROVIDES THE BEST ESTIMATE OF AN UNBIASED
DISTRIBUTION OF DATA
x is the most commonly used measure of central tendency because
it usually provides the “best estimate” of the most typical value in
the distribution of data.
x =15.6 for the last example
BIAS:
In statistics, bias is systematic favoritism (tendency to make
systematic errors) present in data collection, analysis or reporting of
quantitative search
MEASURES OF CENTRAL TENDENCY
MEDIAN
IT IS THE VALUE AT THE MIDDLE POSITION OF A
DISTRIBUTION OF DATA
IT IS USUALLY USED WHEN THE DISTRIBUTION
IS BIASED
Median is the middle value of the given numbers or distribution
in their ascending order. Median is the average value of the two
middle elements when the size of the distribution is even.
13, 14, 14, 14, 15, 15, 16, 18, 18, 19
(It is 15 for the last example)
MEASURES OF CENTRAL TENDENCY
MODE
IT IS THE VALUE HAVING THE HIGHEST
FREQUENCY
IN THE SAMPLE DISTRIBUTION
( It is not very meaningful unless n is too large )
(It is 14 for the last example)
GEOMETRIC MEAN (Log - Mean)
1/n
n
x g x i
i1
1 n
log( x g ) log( x i )
n i 1
IT IS IMPORTANT WHEN DEALING WITH
RATIOS OR PERCENTAGES
(It is 15.5 for the last example)
HARMONIC MEAN
n
x h n (1 / x i )
i1
(It is 15.4 for the last example)
QUADRATIC MEAN
(ROOT - MEAN - SQUARE )
1 n 2
x rms
n i1
xi
It can be considered as the second moment of a set of
data about its origin. (It is 15.7 for the last example)
MEASURES OF DISPERSION OF DATA
VARIANCE
(MEAN SQUARE DEVIATION )
n
1
VAR ( x i x )
2 2
n i 1
It is 3.84 for the last example
MEASURES OF DISPERSION OF DATA
STANDARD DEVIATION
1 n
n i1
( x i x ) 2
( x 2
i ) ( x ) 2
It is 1.96 for the last example
MEASURES OF DISPERSION OF DATA
RANGE
IT IS THE DIFFERENCE BETWEEN
THE LARGEST AND SMALLEST
VALUES OF THE ENTIRE SET OF
DATA
(It is 6 for the last example)
MEASURES OF DISPERSION OF DATA
AVERAGE DEVIATION
n
1
A.D .
n
i1
x x
i
It is 1.72 for the last example
UNBIASED ESTIMATES
If a “random sample” is drawn from a “population”
(or “universe”),
P o p u la t io n o r U n iv e r s e
M ean:
S .D .:
R a n d o m S a m p le (x 1, x 2, … , x n)
UNBIASED ESTIMATES
A) THE SAMPLE MEAN
Population or Universe
x IS THE BEST Mean:
S.D.:
AVAILABLE ESTIMATE
OF THE UNKNOWN
Random Sample (x1, x2, … , xn)
MEAN OF THE
UNIVERSE
UNBIASED ESTIMATES
A) THE BEST Population or Universe
Mean:
AVAILABLE ESTIMATE S.D.:
OF THE UNKNOWN
Random Sample (x1, x2, … , xn)
STANDARD DEVIATION
OF THE UNIVERSE IS GIVEN BY
s
1 n
n 1 i 1
( x i x ) 2
n
n 1
( x
2
i ) ( x ) 2
s
1
n
n 1 i 1
( x i x)
2 n
n 1
( x i ) ( x)
2 2
THE USE OF THIS EXPRESSION BECOMES
IMPORTANT ESPECIALLY WHEN n IS SMALL
FOR LARGE VALUES OF n s sample
HOWEVER, S > sample ALWAYS
(For the last example, s=2.07)
xj C) IF MORE THAN ONE ( SAY m ) EQUAL-SIZED RANDOM
SAMPLES ARE DRAWN FROM THE SAME UNIVERSE, THEN
THEIR RESPECTIVE MEANS AND STANDARD DEVIATIONS ARE
EXPECTED TO BE EQUAL TO EACH OTHER
x 1 x 2 ..... x m Population or Universe
s 1 s 2 ..... s m Sample 1
Sample 2 Sample m
It is also possible to treat xj and sj as statistical quantities and
define their standard deviations
STANDARD ERROR OF THE MEAN
s
sx
n
THIS QUANTITY REPRESENTS THE STANDARD
DEVIATION OF
x FROM
( For the last example, s x = 0.655 )
STANDARD ERROR OF THE
STANDARD DEVIATION
s sx
ss
2n 2
THIS QUANTITY REPRESENTS THE STANDARD
DEVIATION OF s FROM
For the last example, ss=0.463
CONTINUOUS DISTRIBUTIONS
IN ACTUAL EXPERIMENTS VALUES WILL BE LESS
DISCRETE
23.26 , 25.12 , etc
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
14 16 13 19 18 14 14 15 18 15
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
14.21 16.36 13.16 18.74 17.59 14.43 14.02 14.77 18.01 15.16
CONTINUOUS DISTRIBUTIONS
IF WE HAD A SET OF 100 DATA VALUES SUCH AS
23.26 , 25.12 ... , etc THEN THE FREQUENCY GRAPH
WOULD PROBABLY HAVE VERY FEW VALUES THAT
WERE THE SAME
Relative Frequency, fj
0.2
0.1
0.0
13 14 15 16 17 18 19
CONTINUOUS DISTRIBUTIONS
THE ONLY APPARENT MEANINGFUL QUANTITY
APPEARS TO BE THE DENSITY OF THE “DOTS”
CONTINUOUS DISTRIBUTIONS
LET US DIVIDE THE
DATA BY
INCREMENTS
16
CONTINUOUS DISTRIBUTIONS
NOW LET US COUNT
HOW MANY DATA
POINTS ARE BETWEEN
22.51 AND 23.50
16
If all intervals of interest are plotted, the result would
be a bar graph as:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
14 16 13 19 18 14 14 15 18 15
Frequency Relative
nj Frequency
4 0.4
fj
3 0.3
x7
2 0.2
x6 x10 x9
1 0.1
x3 x1 x8 x2 x5 x4
0 0.0
13 14 15 16 17 18 19
IF MORE MEASUREMENTS WITH A MORE
ACCURATE DEVICE WERE TAKEN
x1 x2 x3 x4 x5 x6 x7 x8 x9 x 10
1 4 .2 1 1 6 .3 6 1 3 .1 6 1 8 .7 4 1 7 .5 9 1 4 .4 3 1 4 .0 2 1 4 .7 7 1 8 .0 1 1 5 .1 6
R e la t iv e F r e q u e n c y , f j
0 .2
0 .1
0 .0
13 14 15 16 17 18 19
AND IF THE DATA WERE INCREASED
R e la tiv e F re q u e n c y , f j
0 .1 0
0 .0 5
0 .0 0
13 14 15 16 17 18 19
Relative Frequency, f j
0.10
0.05
0.00
13 14 15 16 17 18 19
When all intervals of interest are plotted, the result would be a
bar graph as:
R elative F requ en cy, f j
0 .0 8 E n v elop e
0 .0 6
0 .0 4
0 .0 2
0 .0 0
13 14 15 16 17 18 19
THE INTERVAL MUST BE CHOSEN
* LARGE ENOUGH TO BE
MEANINGFUL
* SMALL ENOUGH
TO GIVE DETAIL
N = 5 log n for large n
N = 1 + 3.3 log n for n<25 Sturges rule
where n is the num ber of data points and N is
suggested num ber of class intervals.