8/27/2011
STATISTICAL Chapter 3. DESCRIPTION OF DATA
Frequency Distributions Grouped Data Percentiles, Deciles & Quartiles Graphical Representations Symmetry and Skewness
Objectives
Set up a frequency distribution for a mass of data. Calculate the mean, median and mode mean for grouped data. Calculate and interpret other measures of location like the deciles, quartiles & percentiles.
Calculate the standard deviation, variance, mean deviation and quartile deviation for dd t grouped data. Construct histograms, bar charts, frequency polygons, pie charts and ogives. Describe a given set of data in terms of skewness and kurtosis.
Statistical data collected should be arranged in such a manner that will allow a reader to distinguish their essential features. Depending on the type and the objectives of the person presenting the information, data may be presented using one or a combination of three forms.
8/27/2011
Three Forms of Presenting Data
Tabular Form - data is presented in rows and columns
Frequency Distribution
When that data include a large number of observations, it is convenient to group the values into mutually exclusive classes and show the number of observations occurring in each class in a tabular form. A frequency distribution is the arrangement of data that shows the frequency of occurrence of values falling within arbitrarily defined ranges of the variable known as class intervals. The smallest and largest values that fall in a given interval are called class limits.
Se
Ju ne Ju ly Au gu pte s t m b O er c to No be ve r m De be ce r m be r
Textual Form data is presented in paragraph form especially when they are purely qualitative or when very few numbers are involved.
4000 3500
Ja n
bru
Fe
Graphical Form - data is presented in visual form
3000 2500 2000 1500 1000 500 0
ril ua ry ary arc Ap M ay h
1991 1992 1993 1994 1995
8/27/2011
Steps in Making a Frequency Distribution
Class Frequency and Class Mark
Class frequency refers to the
Find the range. Determine the interval size by dividing the
range by the desired number of classes which is normally not less than 10 and not more than 20. h Determine the class limits of the class intervals. Tabulation is facilitated if the lower class limits of the class intervals are multiples of the class size. The bottom interval must include the lowest score.
number of observations falling in a particular class while the midpoint between the upper and lower class limits is called class mark/midpoint.
Problem:
List the intervals, beginning at the
bottom. Tally the frequencies. frequencies Summarize these under a column labeled f. Total this column and record the number at the bottom.
Construct a frequency distribution of the given scores on a test.
56 28 42 56 47 39 62 60 54 47 78 82 55 56 41 44 54 42 62 48 62 38 57 55 50 47 42 56 68 53 37 72 65 66 52 52 48 48 42 68
8/27/2011
Solution:
Computing for the range:
We choose 5 because it is the odd number. If i = 5, lowest limit should be 25. We choose 25 because it is the smallest multiple of the chosen interval which is smaller than the smallest value in the set set. If lowest limit is 25, the bottom interval should be 29 25. The interval 29 - 25 contains the lowest score (28).
R = 82 28 = 54
Computing for the class interval: C ti f th l i t l
54 5 .4 10
Therefore, class interval may be 5 or 6.
Classes
Tally / / / //// //// /////// ////// ////// ////// ///
f 1 1 1 4 4 7 6 6 6 3 0
84 - 80 79 - 75 74 - 70 69 - 65 64 - 60 59 - 55 54 - 50 49 - 45 44 - 40 39 - 35 34 - 30 29 - 25
For Grouped Data ( > 30 values)
MEASURES OF CENTRAL TENDENCY
MEAN
Methods : 1. Midpoint Method 2. Short Method
N f 40
8/27/2011
Midpoint Method
After the f column, make another column and enter the midpoint (Xm) of each class. Multiply the frequency with the midpoint and enter it in the next column Label the column f Xm. Get the sum column. sum. Use the formula:
Short Method
Choose a class at or near the middle of the distribution to be designated as the origin. After the f column, construct the deviation column (d). Mark the chosen class zero. In succession, write -1, -2 and so on for classes lower in value than the origin. In like manner, write 1, 2, 3 and so on for classes greater in value than the origin. Construct f x d column and get the algebraic sum.
( fX
N
Problem:
Use the formula:
Classes f 4 7 12 10 9 6 2
x z
( fxd)alg
N
For the given frequency distribution, distribution compute for the mean using:
54-50 49 45 49-45 44-40 39-35 34-30 29-25 24-20
where z = midpt. of class chosen as origin
Midpoint Method Short Method
8/27/2011
Solution: Using Midpoint Method
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 Xm 52 47 42 37 32 27 22 fXm 208 329 504 307 288 162 44
m
Using Short Method
( fX
N
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20
f 4 7 12 10 9 6 2 N = 50
d 3 2 1 0 -1 -2 -3
fd 12 14 12 0 -9 -12 -6
x z
( fxd)
N
11 (5) 50
a lg
1905 50
x 37
x 38.1
x 38.1
fX
1905
fd 11
MEDIAN
Steps: N Find 2 Find the accumulated sum of the frequencies up to the sum that contains N 2
Use the formula:
(N cf ) Md L 2 i f
where L = lower limit of class which contains N/2 f = frequency of class containing N/2 cf = cumulative sum that approaches or is equal to N/2
8/27/2011
MODE
Rough Mode( R. Mo) - obtained by inspection and is equal to the p q Xm of class having the highest frequency. Theoretical Mode( T. Mo) 3Md 2x
Problem:
For the given frequency distribution in the previous problem, compute for the: Median R. Mode T. Mode
Solution: Computing for the Median
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 27 17 8 2 cf
Computing for the Mode
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50
i=5
R. Mode = 42
(N cf ) Md L 2 i f (2517) Md 35 (5) 10 Md 39
N 50 25 2 2
T. Mode 3Md 2x
since
Md 39
x 38.1
T. Mode 3(39) 2(38.1)
T. Mode 40.8
8/27/2011
Other Measures of Position Quartiles Deciles Percentiles
Quartiles - those which divide the distribution into 4 parts
Qk L
( kN
4 f
cf )
Deciles - those which divide the distribution into 10 parts
Percentiles - those which divide the distribution into 100 parts
Dk L
( kN
10 f
cf ) i Pk L
( kN
100 f
cf ) i
8/27/2011
Problem:
For the given frequency distribution in the previous problem, compute for:
Solution: Computing for Q1
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 17 8 2 cf
i=5
kN (1)50 12.5 4 4
Q1 D3 P88
Qk L
(kN cf ) 4 i f (12.5 8) Q1 30 (5) 9 Q1 32.5
Computing for D3
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 17 8 2 cf
Computing for P88
i=5
kN (3)50 15 10 10
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 46 39 27 17 8 2 cf
i=5
kN (88)50 44 100 100
(kN cf ) 10 Dk L i f (158) D3 30 (5) 9 D3 33.89
(kN cf ) 100 i f (44 39) P 45 (5) 88 7 P 48.57 88 P L k
8/27/2011
For Grouped Data ( > 30 values)
VARIANCE
MEASURES OF VARIATION
RANGE
The range is computed as the difference between the upper limit of the highest class interval and the lower limit of the lowest class interval.
f ( xm x ) 2 N
STANDARD DEVIATION
f (x
x) 2
MEAN DEVIATION
D
Problem:
f x
For the given frequency distribution, determine: variance standard deviation mean deviation quartile deviation
QUARTILE DEVIATION
Q Q3 Q1 2
10
8/27/2011
Solution: Computing for the Mean
Classes f 1 1 2 3 4 4 7 6 6 6 3 9
Xm 87 82 77 72 67 62 57 52 47 42 37 32
fXm 87 82 154 216 268 248
399 312 282 252 111
Classes 89-85 84-80 79-75 74-70 69-65 64-60
f 1 1 2 3 4 4
Classes 59-55 54-50 49-45 44-40 39-35 34-30
f 7 6 6 6 3 1
89-85 84-80 79-75 74-70 69-65 64-60 64 60 59-55 54-50 49-45 44-40 39-35 34-30
( fX
N
2443 44
x 55.5
N = 44
fX
32
m
2443
Computing for the Variance
Classes 89-85 84-80 79-75 74-70 69-65 64-60 59-55 54-50 49-45 44-40 39-35 34-30 f 1 1 2 3 4 4 7 6 6 6 3 1 xm
X
(xm - X )2 992.25 702.25 462.25 272.25 132.25 42.25 42 25 2.25 12.25 72.25 182.25 342.25 552.25
f(xm - X )2 992.25 702.25 924.50 816.75 529.00 169.00 169 00 15.75 73.50 433.50 1093.50 1026.75 552.25
31.5 26.5 21.5 16.5 11.5 6.5 65 1.5 -3.5 -8.5 -13.5 -18.5 -23.5
Computing for the Standard Deviation
2
f ( xm x ) N
2
Since
2 166.57
7329 44
2
166.57
12.906
2 166.57
N = 44
f (x
x) 2 7329
11
8/27/2011
Computing for the Mean Deviation
Classes 89-85 84-80 79-75 74-70 69-65 f 1 1 2 3 4 4
/xm X / 31.5 26.5 21.5 16.5 11.5 6.5 65 1.5 3.5 8.5 13.5 18.5 23.5
Computing for the Quartile Deviation
Classes 89-85 f 1 1 2 3 4 4 7 6 6 6 3 1 cf 44 43 42 40 37 33 29 22 16 10 4 1
f /x m -
31.5 26.5 43.0 49.5 46.0 26.0 26 0 10.5 21.0 51.0 81.0 55.5 23.5
m
Qk L
kN 4 cf i
f
f x
84-80
m
79-75 74-70 69-65 64-60 59-55 54-50 49-45 44-40 39-35 34-30
kN 1(44) 11 4 4
Q1 45
11 10 5 45.83
6
64-60 64 60 59-55 54-50 49-45 44-40 39-35 34-30
7 6 6 6 3 1
465 D 44
kN 3( 44) 33 4 4
D 10.6
Q3 60
33 335 60
4
N = 44
f x
x 465
N = 44
Q3 Q1 60 45.83 2 2 Q 7.085
BAR GRAPH Types of Graphs
The bar graph is particularly useful in presenting data gathered from discrete variables on a nominal scale It uses rectangles scale. or bars to represent discrete classes of data. The base of each bar corresponds to a class interval of the frequency distribution and the heights of the bars represent the frequencies associated with each class.
12
8/27/2011
HISTOGRAM
The histogram is similar to a bar chart but the bases of each bar are the class boundaries rather than class limits.
FREQUENCY POLYGON
A frequency polygon is a line q y p yg graph of class frequencies plotted against class marks.
Problem:
C lasses f
BAR GRAPH
15 Fr e q u e n c y 10 5 0 20-24 25-29 30-34 35-39 40-44 45-49 50-54 Class Marks
For the following frequency distribution, construct: bar graph histogram frequency polygon
54-50 49-45 44-40 39-35 34-30 29-25 24-20
4 7 12 10 9 6 2
13
8/27/2011
HISTOGRAM
15 Fr e q u e n cy 10 6 5 0 Class Boundaries 2 9 10 7 4 12
FREQUENCY POLYGON
15 Fr e q u e n cy 10 5 0 20-24 25-29 30-34 35-39 40-44 45-49 50-54 Classes
PIE CHART
Problem:
The following table classifies enrolment in a certain university. Construct a pie chart to show the enrolment distribution.
A pie chart is used to represent quantities that make up a whole.
Engineering Commerce Education Arts & Sciences Law
5280 3000 1800 1320 600
Engineering Commerce Education Arts & Sciences Law
14
8/27/2011
CUMULATIVE FREQUENCY CURVE
(Ogive Curve)
Problem:
An ogive curve is a line graph obtained by plotting values from the tabular arrangement b class i t t by l intervals whose l h frequencies are cumulated. From this curve, the centile rank of a certain score can be determined. A centile rank denotes the percentage of scores that fall below a specified score in a distribution.
Construct the ogive curve for the given frequency distribution. What score correspond to C50? C88? What is the centile rank of a score of 50?
Classes 64-60 59-55 54-50 49-45 44-40 39-35 34-30 29-25 24-20 19-15 14-10 9-5
f 2 12 20 32 46 58 64 58 42 23 15 4
cf 376 374 362 342 310 264 206 142 84 42 19 4
CP (cf/N x 100) 100.0 99.5 96.3
CP
120 100 80 60 40 20 0 0 9 14 19 24 29 34 39 44 49 UL 54 59 64
91.0 82.4 70.2 54.8 37.8 22.3 11.2 5.0 1.1
Ogiv e Curve
Score 50 = C91
C50 = 33
C88 = 48
N 376
15
8/27/2011
Kurtosis and Skewness
The measures of skewness and kurtosis indicate the extent of departure of a distribution from normal and permit comparison of two or more distributions.
KURTOSIS (ku)
Kurtosis refers to the flatness or peakedness of a frequency distribution. It shows the shape of the curve or the arrangement of a set of distribution in relation to the other set of distribution. The coefficient of kurtosis is given by:
ku
Q P90 P 10
Types of Kurtosis
leptokurtic (ku < 0.263) mesokurtic (ku = 0.263) platykurtic (ku > 0.263)
SKEWNESS (sk)
Skewness refers to the symmetry or
asymmetry of a frequency distribution. The coefficient of skewness is given by:
sk
3( x md ) s
16
8/27/2011
If sk = 0, the distribution is normal.
If sk < 0, the distribution is negatively skewed.
X Md Mo
X Md Mo
( Mo Md X )
If sk > 0, the distribution is positively skewed.
Problem:
For a certain frequency distribution, the ff. data are given:
s 13.7
Q3 155.8 P90 167.5 D1 128.8
md 147 Q1 138
x 147
Mo Md X
Determine the kurtosis and skewness of the distribution. Is it a normal distribution?
( X Md Mo)
17
8/27/2011
Solution:
Q3 Q1
Q 2 ku P90 D1 P90 P 10
sk
sk
3( x md ) s
155.8 138 2 ku 0.23 167.5 128.8
Distribution is leptokurtic.
3(147 147.25) 0.05 13.7
Distribution is negatively skewed.
Part I. Answer the following:
Student Activity
1. Define each of the following: a c ass a a. class mark c histogram c. stog a b. ogive d. frequency polygon 2. What advantages does each of the following forms of presenting data offer? a. textual b. tabular c. graphical
18
8/27/2011
Part II. Solve the following using Microsoft Excel Applications. 3. Distinguish between: a. class limits and class boundaries b. skewness and kurtosis 4. Give the class mark, the class boundaries and the interval size for each of the following: a. 10 19 b. 1.5 5.0 c. 12.85 13.43
The list below gives the weekly food budget and weekly incomes for 39 households.
1. 1 Construct frequency distribution table for food budget using i = 25 and determine:
a. mean b. median c. rough and theoretical mode d skewness
F ood B udget 1598 1680 1660 1583 1476 1633 1717 1596 1613 1607 1728 1672 1572 1634 1461 1726 1732 1620 1616 1579
W eekly In com e 1553 1740 1652 1581 1481 1634 1692 1561 1566 1626 1699 1685 1589 1571 1443 1712 1724 1628 1564 1526
F ood B u dget 1639 1655 1736 1587 1622 1689 1700 1613 1615 1458 1750 1700 1654 1625 1565 1563 1566 1587 1584
W eekly In com e 1636 1677 1761 1603 1605 1631 1765 1688 1667 1479 1747 1673 1641 1613 1521 1583 1542 1567 1610
2. Construct frequency distribution table for weekly income using i = 25 and determine:
a) b) c) d) standard deviation mean deviation quartile deviation kurtosis
3. Plot a bar chart for food budget and superimpose on it the frequency polygon for weekly income.
19
8/27/2011
4. Take the difference between weekly income and food budget for each household and construct a frequency distribution d di t ib ti and cumulative frequency l ti f distribution. 5. Plot the ogive curve for the data in (4). What score corresponds to a centile rank of 71?
ProceedtoTopic4 Proceed to Topic 4
20