Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
111 views18 pages

BSTAT HANDOUTS - DESCRIPTIVE ONLY Handouts 3

The document discusses measures of central tendency and variability used to summarize quantitative data. It defines the mean, weighted mean, and median, and provides examples of calculating each. It differentiates between population and sample measures, using Greek letters for parameters and Latin letters for statistics.

Uploaded by

Clint Parzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views18 pages

BSTAT HANDOUTS - DESCRIPTIVE ONLY Handouts 3

The document discusses measures of central tendency and variability used to summarize quantitative data. It defines the mean, weighted mean, and median, and provides examples of calculating each. It differentiates between population and sample measures, using Greek letters for parameters and Latin letters for statistics.

Uploaded by

Clint Parzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIVERSITY OF ST.

LA SALLE
Yu An Log College of Business and Accountancy

BSTAT – BUSINESS STATISTICS


First Semester, Ay 2020 – 2021

HANDOUTS 3

MEASURES OF CENTRAL TENDENCY & VARIABILITY

 Recall: Statistics involves a body of techniques and procedures dealing with the collection,
organization, analysis, interpretation, and presentation of information that can be stated
numerically.

Summarizing data involves using statistical tools and procedures appropriate for answering a research
problem or objective.

The following terms are needed need to be differentiated:

Measure – a numerical representation of a particular characteristic (variable of the study) of the group
being studied

Parameter – A measure calculated from the population; usually represented by letters of the Greek
alphabet
Statistic – A measure calculated from the sample; usually represented by letters of the English alphabet

Summaries of QUALITATIVE DATA:

 Qualitative data are summarized using the following measures:

 proportions ( also called relative frequencies)


 percentages

For example: the variable sex is coded as


M–0
F –1

Remark: Since “sex” is a qualitative variable and the codes 0 and 1 represent nominal data, then
it is not appropriate to consider them as numbers with values, so it is not correct to
apply arithmetic operations such as addition and division to get the “average sex” since
it will not make any sense for a qualitative variable; Rather, use proportion (or
percentage) of males (or females) in the group

Say, “Two out of 10 students are male,” or “twenty percent of the students are males”

Summaries of QUANTITATIVE DATA:

 Quantitative data are usually summarized in terms of the center and spread of the distribution.
 The center of the distribution can be identified using an appropriate measure of central
tendency or location.

LEONARES, S. R. 1
MEASURES OF CENTRAL TENDENCY OR LOCATION (AVERAGES)

A measure of central tendency or location is


 representative value of the data set
 the value around which most of the data points are found

(ARITHMETIC) MEAN

 computed by summing all the data values in the sample or population and dividing the sum by
the number of observations (usually referred to as “average”)
 Most important measure representing the center of the distribution if the distribution is
symmetric
 data must be at least interval
 Most stable measure of location, especially for large data sets
 When n is small, the mean is very sensitive to extreme values
 Differentiate between the population and sample means by their symbols:

Population Mean:  
x i
, where x i is the ith score or observation, and N is the number
N
of observations in the population (the parameter is ,
the Greek letter “mu”)

Sample Mean: x 
x i
, where x i is the ith score or observation, and n is the number of
n
observations in the sample (the statistic is 𝑥̅ , and is
read as “x-bar”)

 Why differentiate between  and 𝑥̅ : if the research procedure is a population study, then
a populations symbol (parameter) must be used; if it is a sample study, then a sample
symbol (statistic) must be used. This will be a very important distinction in inferential
statistics.
 That is why it is important to determine at the beginning of the research process if you
will be doing a population of sample study, since it will have a bearing in the use of
notations/symbols for parameters or statistics.

Example 1: During a particular summer month, the eight salespeople in an appliance store sold the
following number of central air-conditioning units: 8, 11, 5, 14, 8, 11, 16, 11. Considering this month as
the statistical population of interest, the mean number of units sold is


x i

84
 10.5 central a / c units
N 8

Why ? Because the problem stated that the month should be considered as a statistical population of
interest.

LEONARES, S. R. 2
WEIGHTED MEAN

 Also called the weighted average


 an arithmetic mean in which each value is weighted according to its importance in the overall
group
 formulas for the population, and sample weighted means are identical:

 w or X w 
 wX 
w
 Operationally, each value in the group (X) is multiplied by the appropriate weight factor (w), and
the products are then summed and divided by the sum of the weights.

Example 2: In a multiproduct company, the profit margins for the company’s four product lines during
the past fiscal year were: line A, 4.2percent; line B, 5.5 percent; line C, 7.4 percent; and line D, 10.1
percent.

The unweighted mean profit margin is


 x  27.2  6.80%
N 4

However, unless the four products are equal in sales, this unweighted average is incorrect. Assuming the
sales totals in the following table which are not all equal, the weighted mean correctly describes the
overall average.

Product Line Profit Margin, X (%) Sales, in Php (w) wX


A 4.2 30,000,000 126,000,000
B 5.5 20,000,000 110,000,000
C 7.4 5,000,000 37,000,000
D 10.1 3,000,000 30,300,000
Total Php58,000,000 Php303,300,000

Hence, the weighted mean profit margin is

303,300,000
w   5.22%
58,000,000

Remark: The weighted mean is used in computing for final grades when the number of units of
the subjects are not equal. Each grade is multiplied by the number of units of the
subject, and the sum of the (grades x no. of units) is divided by the total number of
units taken.

LEONARES, S. R. 3
MEDIAN

 Center of an array (arrangement of the data from lowest to highest)


 Divides the array into two equal parts
 Useful for summarizing skewed distributions because it is not sensitive to extreme values
 Equal to the mean for symmetric distributions
 Data must be at least ordinal
 If N (or n) is odd, the median is the middle number of the array
 If N (or n) is even, the median is the mean of the two middle values

 Population Median: ~
 (read “mu-tilde”)

 Sample Median: ~
x (read “x-tilde”)

Example 3: The eight salespeople described in Example 1 sold the following number of central air-
conditioning units, in ascending order: 5, 8, 8, 11, 11, 11, 14, 16. Find the median.

Array: 5, 8, 8, 11, 11, 11, 14, 16

~  11  11  11
 central a/c units
2
Since the number of data values is even (N = 8), then the value of the median is the mean of the two
middle values, which are the fourth and fifth values in the ordered group. Both these values equal “11”
in this case, so adding the two 11’s and dividing by 2 gives the median which is equal to 11. Note that
there is an equal number of data points below and above the median (5, 8, 8, 11 are below; 11, 11, 14,
16 are above).

Example 4: The reaction times for a random sample of 9 subjects to a stimulant were recorded as 2.5,
3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1, and 3.4 seconds. Calculate the median.

First form the array: 2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3

Since there are 9 data values (odd), then there will only be one middle value.

2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3

𝑥̃ = 3.1 seconds

̃? Because the problem specifically identifies the group as a random sample.


Why 𝒙

NOTE: When the problem does not specifically indicate whether the group involved is a sample
or population, treat the data set as a sample.

LEONARES, S. R. 4
Recall Example 1:

During a particular summer month, the eight salespeople in an appliance store sold the following number
of central air-conditioning units: 8, 11, 5, 14, 8, 11, 16, 11. Considering this month as the statistical
population of interest,
a. the mean number of units sold is


x i

84
 10.5 central a / c units
N 8

b. the median value from Example 3 is

~  11  11  11
 central a/c units
2

Dot plot: The mean and median are relatively close to each other.


 
    
5 6 7 8 9 10 11 12 13 14 15 16

 The mean and the median values would be considered to be good representatives of the
data set since they are located in the center of the distribution (where the points are).

 What if, instead of 16, the highest value is 160?

 Then the last point of the dot plot would be very far from the rest of the points (extremely high
value) – it can also be called an outlier.

Solution with the outlier, 160:

Array: 5, 8, 8, 11, 11, 11, 14, 160

Then: 
x i

228
 28.5 central a / c units
N 8

~  11  11  11
 central a/c units
2
 The resulting value of the mean is not found at the center of where the points are
(28.5 is far from the majority of the points), while the median remains the same.

 The value of the mean is affected if there are extreme values in the distribution, hence, it
cannot be used to represent the distribution if the shape is skewed. That is why, one
condition for its use as a representative value is that the shape must be symmetric.

 On the other hand, the median has not changed, because only the middle value (if n is
odd) or the mean of the two middle values (if n is even) is used; the extreme value is not
used in determining the median. Therefore, the median is a better representative value if
the shape of the distribution is skewed.

LEONARES, S. R. 5
MODE

 Value in the data set which has the highest frequency (occurs most often)
 Can be applied to any measurement level
 May not exist (the data set may not have a mode if all the values occur with the same frequency)
 May not be unique, if it exists (a data set may have more than one value which have the same
highest fequency
 Related to the concept of a peak or peaks in the frequency distribution
 Unimodal – one peak
 Bimodal – two peaks, etc.

 Population Mode: Mo
 Sample Mode: mo

Example 5: The eight salespeople described in Example 1 sold the following number of central air-
conditioning units: 8, 11, 5, 14, 8, 11, 16, and 11. Find the mode.

Mo =11 central air-conditioning units

Example 6: The reaction times for a random sample of 9 subjects to a stimulant were recorded as 2.5,
3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1, and 3.4 seconds. Find the mode.

 Since all values occur only once (they have the same frequency), then this distribution has
no mode or we say that the mode does not exist.
 This different from saying that the mode is 0 (why?)

RELATIONSHIP BETWEEN THE MEAN AND THE MEDIAN:

Note that the shape of the distribution is important in choosing the most appropriate measure of central
tendency (and in other measures and tests as well). Hence, to determine the shape and there is no graph
to base it on, comparing the mean and median values will determine the shape:

a. symmetric distribution: mean = median


b. positively skewed distribution: mean > median
c. negatively skewed distribution: mean <median

Notes: 1. Since the mode does not always exist, it is just the mean and the median that are compared.
2. A positively skewed distribution indicates that the values mostly cluster on the lower half of
the distribution but there are few extremely high values. When the mean is computed, these
high values influence the value of the computed mean and pull its value away from the center
towards where the extremely high values are. On the other hand, the median is not affected
by extreme values, so it stays closer to where most of the values are. That is why, for a
positively skewed distribution, the median is a better representative value than the mean.
3. A negatively skewed distribution as majority of the data clustering on the upper half of the
distribution but there are few extremely low values. For the same reason as in the positively
skewed distribution, the mean is pulled towards where the few extremely lower values are.
The median is the better representative value compared to the mean.

LEONARES, S. R. 6
READ: https://www.khanacademy.org/math/ap-statistics/quantitative-data-ap/describing-
comparing-distributions/v/classifying-distributions

EXERCISES: Show complete solutions. For each item, identify the following needed information:
a. determine whether the data set constitutes a population or sample.
b. identify the variable of the problem (label this as X)

Example for #1:


a. sample (no mention of whether population or sample)
b. X : score in an achievement test in Mathematics (note: this is always stated in the
singular form)

1. The following are scores of 50 high school students in a 150-item achievement test in Mathematics.

112 107 97 69 72 115 81 102 91 76


73 73 86 76 92 95 106 80 81 141
126 124 127 118 128 84 75 98 113 119
82 83 134 132 104 68 95 106 115 98
92 92 100 96 108 100 119 106 94 85

a. Find the mean, median, and mode.


b. What is the shape of the distribution?

2. According to a survey, the average person spends 45 minutes a day listening to recorded music. The
following data were obtained for the number of minutes spent listening to recorded music for a
sample of 30 individuals.
88.3 4.3 4.6 7.0 9.2
0.0 99.2 34.9 81.7 0.0
85.4 0.0 17.5 45.0 53.3
29.1 28.8 0.0 98.9 64.5
4.4 67.9 94.2 7.6 56.6
52.9 145.6 70.4 65.1 63.6

LEONARES, S. R. 7
a. Compute the mean.
Do these data appear to be consistent with the average reported by the newspaper? Explain
your answer.

b. Compute the median.


Between the mean and the median, which measure do you think is more appropriate to use
for this data set? Why?

3. During a 30-day period, the daily number of cars rented of a car rental company are as follows:

7 10 6 7 9 4 7 9 9 8
5 5 7 8 4 6 9 7 12 7
9 10 4 7 5 9 8 9 5 7

a. Find the mean, median, and mode.

b. If the break-even point for the company is 8 cars per day, is the company doing well? Explain.

4. Find the preferred measure of central location for the sample whose observations 18, 10, 11, 98, 22,
15, 11, 25, and 17 represent the number of automobiles sold during this past month by 9 different
automobile agencies. Justify your choice.

5. For a sample of 15 students at an elementary-school snack bar, the following sales amounts arranged
in ascending order of magnitude are observed: Php10, 10, 25, 25, 27, 30, 33, 35, 40, 43, 45, 45, 50, 55,
60.
a. Determine the mean, median, and mode for these sales amounts.

b. How would you describe the distribution from the standpoint of skewness?

6. The following table shows the percentage of defective items in an assembly department. Determine
the overall percentage defective of all items assembled during the sampled week.

Shift Percentage Number of Items,


defective in thousands
1 1.1 210
2 1.5 120
3 2.3 50

7. The average IQ of 10 students in a mathematics course is 114. If 9 of the students have IQs of 101,
125, 118, 128, 106, 115, 99, 118, and 109, what must be the other IQ?

8. What is the average for a student who received grades of 85, 76, and 82 on 3 tests and a 79 on the
final examination in a certain course if the final examination counts three times as much as each of
the 3 tests?

LEONARES, S. R. 8
INTRODUCTION TO VARIABILITY

Consider the following two sets (male and female) of number of bottles of soft drink consumed in a week:

A 3 4 5 6 8 9 10 12 15
B 3 7 7 7 8 8 8 9 15

Fill the table with the needed information:

n x ~
x
A

Describe the two sets with respect to the two measures: _______________________________________
_____________________________________________________________________________________

Remarks:
 The measures of central location do not give an adequate description of a given distribution if the
purpose is to differentiate between the two using measures (the two sets have the same mean
and median)
n x ~
x
A 9 8 bots 8 bots

B 9 8 bots 8 bots

 The two measures do not describe how the observations spread out from the average

Consider the dot plot of the two sets (Set B above the line; set A below):

 
 
    

3 4 5 6 7 8 9 10 11 12 13 14 15

        

 The dot plot shows that the points of B are more closely clustered about the center, while the
points of A are scattered, yet they have the same mean and median 43

 Therefore, there is a need to use a measure that will differentiate between the two distributions
in terms of how they are scattered/dispersed

LEONARES, S. R. 9
MEASURES OF VARIATION

 Also called measures of dispersion or variability


 Numerically describe the degree of dispersion, scatter or spread of scores in a distribution.

RANGE

 difference in value between the highest (maximum) and the lowest (minimum) observation
 can be computed very quickly
 but not very useful because it considers only the extremes
 does not take into consideration the bulk of the observations.
 The range is used when:
1. the data are too scant or too scattered to justify the computation of a more precise measure
of variability.
2. a knowledge of extreme scores or a total spread is all that is wanted.

Example: Find the range of the two sets given above.


RA = 15 – 3 = 12.0 points

RB = 15 – 3 = 12.0 points

 this example shows an instance wherein range values are not able to differentiate between set
A and set B, although the dot plots present different “stories”
 there is a need to have a measure that will be able to truly distinguish between the two sets

STANDARD DEVIATION

 most important and most commonly used measure of variation, together with the mean as a
measure of central tendency
 a measure of variability that is based on the difference between the value of each observation (xi)
and the mean
 difference between each xi and the mean is called a deviation about the mean

 deviation = xi – mean (depending on whether data set constitutes a population, in


which case  is used; otherwise it is x ̅)

The standard deviation is used when:


 the statistic having the greatest stability is desired.
 the mean is the preferred measure of central tendency.

 x  
2

Definitional formula for the population standard deviation:  


2 i

 x 
2
x
s 
2 i
Definitional formula for the sample standard deviation:
n 1

LEONARES, S. R. 10
N x i2  ( x i ) 2
Raw score formula for the population standard deviation: 
N2

n  x i2  ( x i ) 2
Raw score formula for the sample standard deviation: s
n( n  1)

Calculation of the Variance and Standard Deviation: Raw Score Method

Remark: It would be good for you to have a scientific calculator with an SD mode so that you will just
have to learn how to key in the data. Your calculator will generate the values of the measures that you
would like to solve. Since different models work differently, search a You tube tutorial for the particular
calculator model that you have.

Example:

Given the following sample data set (xi) where X : score in a quiz ( n = 10):

Xi Xi2
32 pts (32 pts)(32pts) = 1,024 pts2
71 pts (71 pts)(71 pts) = 5,041 pts2
64 pts (64 pts)(64 pts) = 4,096 pts2
50 pts (50 pts)(50 pts) = 2,500 pts2
48 pts (48 pts)(48 pts) = 2,304 pts2
63 pts (63 pts)(63 pts) = 3,969 pts2
38 pts (38 pts)(38 pts) = 1,444 pts2
41 pts (41 pts)(41 pts) = 1,681 pts2
47 pts (47 pts)(47 pts) = 2,209 pts2
52 pts (52 pts)(52 pts) = 2,704 pts2
Sum of the
column x i  506  xi  26,972 pts2
2

10( 26,972 pts 2 )  (506 pts) 2 269,720 pts 2  256,036 pts 2 13,684 pts 2
s 
2
   152.04 pts 2
10(9) 90 90

 since it makes no sense to have a measure in terms of squared units of the original unit
of measurement (e.g., pts2), the unit has to be reverted back to the original unit
(pts) which can be done by extracting the square root of the value of the variance

Therefore, the standard deviation, s  152.04 pts 2  12.3 pts

LEONARES, S. R. 11
Example: Solve for the standard deviations of the two sets of data on page 1.
A B
2
x x x x2
1 3 9 3 9
2 4 16 7 49
3 5 25 7 49
4 6 36 7 49
5 8 64 8 64
6 9 81 8 64
7 10 100 8 64
8 12 144 9 81
9 15 225 15 225
x i  72 x 2
i  700 x i  72 x 2
i  654

NOTE: If it helps you by creating a table, you may do so, otherwise just presenting the solution in terms
of summations (like below) without the table will suffice.

Set A: x = 72
x2 = 700

9(700)  (72) 2 6300  5184 1116


s 2A     15.5
9(8) 72 72
s A  15.5  3.9 books

Set B: x = 72
x2 = 654

9(654)  (72) 2 5886  5184 702


s B2     9.75
9(8) 72 72
s B  9.75  3.1 books

VARIANCE
 square of the standard deviation:
population variance: 2
sample variance: s2
 of little use in descriptive statistics because its calculated value is expressed in square units of
measurement

WATCH:
Statistics Fundamentals: The Mean, Variance and Standard Deviation.
https://www.youtube.com/watch?v=SzZ6GpcfoQY

Range, variance and standard deviation as measures of dispersion | Khan Academy.


https://www.youtube.com/watch?v=E4HAYd0QnRc&list=TLPQMDMwNTIwMjC21B29Fxwxjg&index=1

Mean, variance, and standard deviation (raw data).https://www.youtube.com/watch?v=75-DpMsd-7w

LEONARES, S. R. 12
APPLICATIONS OF THE STANDARD DEVIATION

A. COEFFICIENT OF VARIATION

 a relative measure of variation (comparing one relative to the other)


 expresses the standard deviation as a percentage of the mean
 expressed in percent, it can be used to compare the variability of two or more distributions when
o observations are expressed in different units of measurement, or
o the data sets being compared have different means
 the greater the CV, the greater the variability
 formula:
s
CV   100%
x

Note: terms used interchangeably: more uniform, more homogeneous, more compact, less dispersed,
less scattered, less variable, less heterogeneous, less varied

Remark:
In the investing world, the coefficient of variation allows you to determine how much volatility (risk) you
are assuming in comparison to the amount of return you can expect from your investment. In simple
language, the lower the ratio of standard deviation to mean return, the better your risk-return tradeoff.

Example: Consider two investment proposals, A and B, with the following data:

The coefficient of variation for each proposal is:

For A: $107.70/$230 x 100% = 47%


For B: $208.57/$250 x 100% = 83%

 herefore, because the coefficient of variation is a relative measure of risk, B is considered more risky
than A. Although B has a greater mean ($250) than A ($230), be is considered a more risky investment
since B is more volatile than A, meaning, your earning with B can vary from $41.43 to $458.57, while for
A it is from $122.93 to $337.07 (the greater the CV, the more variable the data of the group).

Example: The weights of 10 boxes of a certain brand of cereal have a mean content of 278.0 grams with
a standard deviation of 9.6 grams. If these boxes were purchased at 10 different stores and the mean
price per box is PhP64.50 with a standard deviation of PhP4.50, can you conclude that the weights are
relatively more homogeneous than the prices?
9.6 𝑔𝑟𝑎𝑚𝑠
CVw = x 100% = 3.5%
278.0 𝑔𝑟𝑎𝑚𝑠

𝑃ℎ𝑃4.50
CVp = x 100% = 6.98%
𝑃ℎ𝑃64.50

 Yes, the weights are relatively more homogenous than the prices, because the CV for the
weights is less than the CV for the prices.

LEONARES, S. R. 13
B. STANDARD SCORE

 also called the z-score; transformed raw score


 a measure of relative position
 usually used to compare observations in two or more different distributions of raw scores which have
different means and/or different standard deviations.
 no unit of measurement
 the mean of standard scores is zero
 a positive standard score indicates that the transformed raw score is above or higher than the mean,
while a negative standard score shows that the given raw score is below or lower than the mean.
 formula for transforming a raw score to a standard score, represented by z, is
xx
z
s

Example: Ruben got a final grade of 85 in both English and Physics. The mean final grades of his class in
these two courses are 80 in English and 75 in Physics with standard deviations of 12 and 10, respectively.
In which subject was his academic performance better in relation to his class?

Subject Ruben’s final grade (x) Class Mean Class Std. Dev.
English 85 80 12
Physics 85 75 10

85−80 85−75
ZE = = 0.40 ZP = = 1.00
12 10

 Ruben had a better academic performance in Physics than in English in relation to


each of his classes
 it is wrong to compare the grades of the two subjects per se since they are not comparable
(how can English be directly compared to Physics?)
 rather, the student’s performance has to be rated in relation to the others in the same class
 need to standardize the grades to eliminate differences in factors

Example: Different typing skills are required for secretaries depending on whether one is working in a law
office, an accounting firm, or for a research mathematical group at a major university. In order to evaluate
candidates for these positions, an employment agency administers three distinct standardized typing
samples. A time penalty has been incorporated into the scoring of each sample based on the number of
typing errors. The mean and standard deviation for each test, together with the score achieved by a recent
applicant, are given in the following table. Determine which office this particular applicant should be
assigned.
Sample Applicant’s Mean Standard
score (xi) ( ) deviation (s)
Law 141 sec 180 sec 30 sec
Accounting 7 min 10 min 2 min
Scientific 33 min 26 min 5 min

X: applicant’s score in terms of time it takes to finish particular manuscript


(the shorter the time the better the performance)

141 𝑠𝑒𝑐−180 𝑠𝑒𝑐


ZL =
30 𝑠𝑒𝑐
= - 1.30

LEONARES, S. R. 14
7 𝑚𝑖𝑛−10 𝑚𝑖𝑛
ZA = = - 1.50
2 𝑚𝑖𝑛

33 𝑚𝑖𝑛 − 26 𝑚𝑖𝑛
ZS = = 1.40
5 𝑚𝑖𝑛

 Since a secretary is supposed to type speedily and accurately, a lower z-score is desired. This
particular applicant should be assigned to an accounting firm.
 there is no need to convert to the same units since the numerator units will cancel with the
denominator units. z should have no unit of measurement

C.. PEARSONIAN SKEWNESS


 measure of relative asymmetry
 compares shapes of two or more distributions
 The means are different
 The standard deviations are different
 no unit of measurement

3(mean  median)
 Formula: SK 
std deviation

 if SK > 0 => positively skewed


 if SK < 0 => negatively skewed
 If SK = 0 => symmetric

 Rule of thumb (Bulmer, 1979): If SK is


• less than −1 or greater than +1, the distribution is highly skewed.
• between −1 and −½ or between +½ and +1, the distribution is moderately skewed.
• between −½ and +½, the distribution is approximately symmetric.

D. EMPIRICAL RULE

 When the data are believed to approximate a bell-shaped distribution, the empirical rule can be
used to determine the percentage of data values that must be within a specified number of
standard deviations of the mean, that is,
o Approximately 68% of the data values will be within 1 standard deviation of the mean
( ± 1) = ( - 1 ,  + 1).

o Approximately 95% of the data values will be within 2 standard deviations of the mean
( ± 2) = ( - 2 ,  + 2).

o Approximately 99.7% of the data values will be within 3 standard deviations of the mean
( ± 3) = ( - 3 ,  + 3).

LEONARES, S. R. 15

Remarks on the bell-shaped curve (also called the normal curve):


1. the horizontal line can go much lower than  - 4 and much higher than  + 4.
2. the total area under the curve and above the horizontal line is 1 or 100%
3. since it is symmetric, the percentage between similarly distanced points on the x-axis from
the mean are equal ( see above figure)
4. 0.15% (on the left of the figure) is the area from  - 3 and below; 0.15% (on the right of
the figure) is from  + 3 and above.
Example: Liquid detergent cartons are filled automatically on a production line. Filling weights
frequently have a bell-shaped distribution. If the mean filling weight is 16.00 ounces and the standard
deviation is 0.25 ounces, use the empirical rule to draw conclusions about the distribution of filling
weights.
 = 16.00 oz ;  = 0.25 oz

LEONARES, S. R. 16
 ± 1 : 16.00 ± 0.25  (16.00 - 0.25, 16.00 + 0.25)
 (15.75, 16.25)
 68% of the liquid detergent cartons have filling weights between
15.75 oz and 16.25 oz

 ± 2 : 16.00 ± (2)0.25  16.00 ± 0.50  (15.50, 16.50)


 95% of the liquid detergent cartons have filling weights
between 15.50 oz and 16.50 oz

 ± 3 : 16.00 ± (3)0.25  16.00 ± 0.75  (15.25, 16.75)


 99.7% of the liquid detergent cartons have filling weights
between 15.25 oz and 16.75 oz

EXERCISES

1. A goal of management is to help their company earn as much as possible relative to the capital
invested. One measure of success is return on equity – the ratio of net income to stockholder’s
equity. Shown here are return on equity percentages for 25 companies. Find the range, variance,
and standard deviation.
9.0 19.6 22.9 41.6 11.4
15.8 52.7 17.3 12.3 5.1
17.3 31.1 9.6 8.6 11.2
12.8 12.2 14.5 9.2 16.6
5.0 30.3 14.7 19.2 6.2

2. During a 30-day period, the daily number of cars rented of a car rental company are as follows:
7 10 6 7 9 4 7 9 9 8
5 5 7 8 4 6 9 7 12 7
9 10 4 7 5 9 8 9 5 7
Find the range, variance, and standard deviation.

3. A manufacturing firm regularly places orders with two different suppliers, A and B. The following
data are the number of days required to fill orders for these suppliers.
Supplier A: 11 10 9 10 11 11 10 11 10 10
Supplier B: 8 10 13 7 10 11 10 7 15 12
Determine which supplier provides the more consistent and reliable delivery times. Use the
range and standard deviation. Since you are comparing the two, why just use the standard
deviation and not compute for the coefficient of variation?

LEONARES, S. R. 17
4. A production department uses a sampling procedure to test the quality of newly produced items.
The department employs the following decision rule at an inspection station: If a sample of 14
items has a variance of more than .005, the production line must be shut down for repairs.
Suppose the following data have been collected:
3.43 3.45 3.43 3.48 3.52 3.50 3.39
3.48 3.41 3.38 3.49 3.45 3.51 3.50
Should the production line be shut down? Why or why not?

5. Two friends want to take a summer holiday before going to college in the autumn. They are looking
for somewhere with plenty of clubs where they can party all night. Unfortunately they have left it
rather late to book and there are only two resorts, Medlena and Bistry, available within their
budget. When they ask about the ages of the holiday-makers at these resorts their travel agent
says the only thing he can tell them is that that the mean age of people going to Medlena is 19
whereas the mean age of visitors to Bistry is 22. Just as they are about to book holidays in Medlena
because it seems to attract the sort of young crowd they want to be with the travel agent says.
‘I’ve got some more figures, the standard deviation of the ages of visitors to Medlena is 8 and the
standard deviation of the ages of visitors to Bistry is 2’. Should they change their minds on the
basis of this new information, and if so, why?

6. Many national academic achievement and aptitude tests, such as the SAT, report standardized
test scores with the mean for the normative group used to establish scoring standards converted
to 500 with a standard deviation of 100. Suppose that the distribution of scores for such a test is
known to be approximately normally distributed. Determine the approximate percentage of
reported scores that would be
a. between 400 and 600
b. between 500 and 700
c. greater than 700
d. less than 200
Hint: Draw the bell-shaped curve and replace the values of  and  on the horizontal axis:

7. A SAT test taker (refer to #6) got a score of 625. What is his standard score?

8. The same student (in #7) got the same score (625) in a different test, the mean of which is 450
and standard deviation 150. In which test did this student fare better?

LEONARES, S. R. 18

You might also like