Chapter 2 - Frequency Dist
Chapter 2 - Frequency Dist
Introduction
This chapter begins our study of descriptive statistics. Recall from Chapter 1 that when
using descriptive statistics we merely describe a set of data. For example, we want to
describe the entry-level salary for a select group of professions. We find that the entry-level
salary for accountants is $38,000, for systems analysts $48,000, for infectious disease
specialists $80,000, and so on. This unorganized data provides little insight into the pattern
of entry-level salaries, which makes conclusions difficult.
This chapter presents a technique that is used to organize raw data into some meaningful
form. It is called a frequency distribution. Then, to better understand the main features of
the data, we portray the frequency distribution in the form of a frequency polygon, a
histogram, or a cumulative frequency distribution.
Frequency Distributions
A frequency distribution is a useful statistical tool for organizing a mass of data into some
meaningful form.
Frequency Distribution: A grouping of data into mutually exclusive classes showing the
number of observations in each.
As noted, a frequency distribution is used to summarize and organize large amounts of data.
As an example, the lengths of service, in years, of a sample of seventeen employees are given
above.
5 up to 7 years //// 5
2. We used a class width of 2.
7 up to 9 years /// 3
3. We used classes 1 up to 3, 3 up to 5,
9 up to 11 yrs. / 1
and so on.
Total 17
4. Tally the lengths of service into the
appropriate classes.
How many classes should there be? A common guideline is from 5 to 15. Having too few or
too many classes gives little insight into the data. A rule for determining the number of
classes is shown on the next page. The size of the class interval may be a value such as 3, 5,
10, 15, 20, 50, 100, 1,000, and so on.
H L
Class Interval i 2 1
k
Where:
i is the class interval.
H is the highest observed value.
L is the lowest observed value.
k is the number of classes.
If we apply the formula to our example, then H = 10, L = 2, and k = 5. We get a class interval
of 2,
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
11
10 2 8
found by: i 1.6 which is rounded to 2.
5 5
Each class has a lower class limit and an upper class limit. The lower limit of the first class is
usually slightly below the smallest value and is a multiple of the class interval.
In the previous example, the smallest number of years of service is 2. Therefore, we selected
1, which is slightly below 2, as the lower limit of the first class. The lower limit of the second
class is 3 years, and so on.
The number of tallies that occurs in each class is called the class frequency.
In the example, the class frequency of the lowest class is 2. For the next higher class it is 6.
The class midpoint divides a class into two equal parts.
Class midpoint: The point halfway between the upper and lower limit of a class.
The class midpoint is computed by adding the lower limit of consecutive classes and
dividing the result by two. Note that the class midpoint is also called the class mark.
In the example, the class midpoint of the 5 up to 7 class is 6 found by (5 + 7)/2. The class
interval is the distance between the lower limit of two consecutive classes. It is 2, found by
subtracting 1 (the lower limit of the first class) from 3 (the lower limit of the second class).
1. The class intervals used in the frequency distribution should be equal. Unequal class
intervals present problems in graphically portraying the distribution. However, in
some situations unequal class intervals may be necessary in order to avoid a large
number of empty classes.
2. Text formula [2-1] above is based on the number of classes, and is useful for
determining the class interval.
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
12
3. Your professional judgment can determine the number of classes. Too many classes or too
few classes might not reveal the basic shape of the distribution. A general rule is that it
is best to use at least 5 and not more than 15 classes when constructing a frequency
distribution.
4. The “2 to the k rule” is also used to determine the 2 to the k Rule for Number of
number of classes. To estimate the number of Classes
classes we select the smallest integer (whole Total Number Recommended
number) such that 2 k n where n is the total of Number of
number of observations. Suppose a set of data has Observations Classes
60 observations. If we try k = 5, we get 25 32 , 9 – 16 4
which is less than 60, so we try 2 64 , which is
6
17 – 32 5
greater than 60. Thus the recommended number 33 – 64 6
of classes is 6. The table is based on the “2 to the k 65 – 128 7
rule.” 129 – 256 8
5. The lower limit of the first class should be an even 257 – 512 9
multiple of the class interval. Suppose a sample of 513 – 1,024 10
weight losses ranged from 25 pounds to 64
pounds. We want to organize the weight losses into a frequency distribution with an
interval of 6 pounds. The lower limit of the first class would be 24, found by
multiplying 4, the even multiple, by 6, the class interval. Obviously this suggestion
was not followed in the above example. Keep in mind that these are only suggestions
not rules.
6. Avoid overlapping stated class limits. Class limits such as 4-6 and 6-8 should not be used.
Use 4 up to 6, then 6 up to 8. This way you can determine in which class to tally 6.
7. Try to avoid open-ended classes. Open-ended classes cause serious graphing problems
and make it difficult to calculate various measures.
It is often helpful to know what percent the class frequencies are of the total number of
observations.
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
13
Relative class frequency: Shows what percent each class is of the total number of
observations (frequencies).
The relative class frequency is found by dividing each of the class frequencies by the total
number of frequencies.
Histogram
Histogram: A graph in which the classes are marked on the horizontal axis and the class
frequencies on the vertical axis. The class frequencies are represented by the heights of
the bars and the bars are drawn adjacent to each other.
Frequency Polygon
A second type of chart used to portray a frequency distribution is the frequency polygon.
Frequency Polygon: A graph that consists of line segments connecting the points formed
by the intersection of the class midpoints and the class frequency.
Class Frequency
distribution of years of service, make the first plot by 5
selecting 2 years on the X-axis (the midpoint) and then go 4
A cumulative frequency polygon reports the number and percent of observations that occur
less than a given value.
Cumulative Frequency Polygon: A graph that shows the number of observations below a
certain value.
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
15
The cumulative frequencies are plotted on the vertical axis (Y-axis) and the lengths of service
on the X-axis.
20 100%
18 90%
Cumulative Frequency
16 80%
14 70%
12 60%
Percent
10 50%
8 40%
6 30%
4 20%
2 10%
0 0%
0 2 4 6 8 10
Years of Service
It may be helpful to plot the cumulative frequencies on the left side of the vertical axis and
the percent of the total on the right side as shown in the polygon above.
The range of values is from $6 to $42. The first digit of Leading Digit Trailing
each number is the stem and the second digit is the Digit
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
16
Millions
$20,000
$15,000
characteristics in the data. $10,000
$5,000
The simple line chart
$0
displays information over 1985 1987 1989 1991 1993 1995 1997 1999 2001
Fiscal Year
a period of time. Time is
always scaled on the horizontal axis. In the line chart a line connects the values for various
periods.
As an example, shown is a line chart illustrating the revenue for the Microsoft Corporation
from the years 1985 through 2001. The revenue ranged from about $140 million in 1985 to
$25 billion in 2001.
The bar chart is often used to display categories. For a bar chart, bars represent the data for
each period. The bars can be shown as vertical or horizontal bars. As an example, shown is
the prime lending rate for the present (May, 2002), six months ago, and a year ago.
8.00%
7.00%
6.00%
5.00%
4.00%
3.00%
2.00%
1.00%
0.00%
Now 6 mos. Ago Year Ago
Note that bar charts and histograms, discussed earlier, both used rectangles to represent the
data. The difference between the two graphs is that the bars in a histogram touch each other
because the data is continuous.
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
17
Another popular chart is a pie chart. Its purpose is Tax Dollar Distribution
to show the relative comparison between parts of a Category Percent
total. Suppose we want to show where our tax
Roads 20
dollar goes.
Education 40
Welfare 15
Salaries 18
Miscellaneous 7
Welfare Salaries
15% 18%
Miscel.
7%
Education Roads
40% 20%
After drawing a circle (pie) we put 0 on the top and go around the circle in increments of 5.
To plot the percent going for roads we draw a line from 0 to
the center of the circle and another line from the center to 20. Then, 20 + 40 = 60. This area
represents the amount going for education. This process is continued for the remaining
items.
Chapter Problems
Problem 2.1
A sample of 30 homes sold during the past year by Gomminger Realty Company was
selected for study. (Selling price is reported in thousands of dollars.)
i) Organize these data into a frequency distribution and interpret your results.
ii) Based on the information from Gomminger Realty in Problem i), develop a histogram.
iii) Based on the information from Problems i) and ii), construct a frequency polygon.
ii)
Histogram
10 9
8
8 $65 up to $70
Frequency
$70 up to $75
6 5 5
$75 up to $80
4 3
$80 up to $85
2 $85 up to $90
0
Class
iii)
Frequency Polygon
10
Frequency
8
6
4
2
0
67.5 72.5 77.5 82.5 87.5
Mid values
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
19
iv)
30
Cumulative Frequency
25
20
15
10
0
$65 up $70 up $75 up $80 up $85 up
to $70 to $75 to $80 to $85 to $90
Class boundary
Problem 2.2
The percent of disposable income (disposable income is the amount of income left after
taxes) spent for groceries for the period from 1975 to 2000 is shown below. Draw a line chart
to depict the trend.
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
20
Answer:
Line Chart
14.00%
% of Disposable
12.00%
10.00%
Income
8.00%
6.00%
4.00%
2.00%
0.00%
1970 1975 1980 1985 1990 1995 2000 2005
Year
Problem 2.3
The purpose of home equity loans by the Home Bank and the percent of each type of loan
relative to the total are shown. Portray the home equity loans information in the form of a
pie chart.
Percent Cumulative
Loan Purpose Of Total Percent
Home improvement 32 32
Debt consolidation 30 62
Car purchase 11 73
Education 10 83
Other 9 92
Investments 8 100
Answer:
11
30
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
21
Exercise 2.1
The Jansen Motor Company has developed a new engine to further reduce gasoline
consumption. The new engine was put in 25 mid-sized cars and the number of miles per
gallon recorded (to the nearest mile per gallon).
29 32 20 30 39
27 28 21 36 20
27 18 32 37 29
30 23 25 19 30
33 24 18 23 34
ii) Use the Jansen Motor Company data in Exercise 2.1 to construct a histogram.
iii) Use the Jansen Motor Company data in Exercise 2.1 to construct a frequency
polygon.
Exercise 2.2
The expenditures on research and development for the Hennen Manufacturing Company
are given. Construct a simple line chart. Also construct a simple bar chart.
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
22
Exercise 2.3
Refer to Problem 2.2. Develop a simple bar chart for the percent of disposable income spent
for groceries.
Exercise 2.4
The data depicts new cars sold in the United States during the year, classified by
manufacturer. Portray the “new cars sold” data in the form of a pie chart.
Cars Sold
Manufacturer Millions
General Motors 3100
Ford 1900
DaimlerChrysler 800
Toyota 800
Honda 800
Nissan 500
Other 1100
Total 9000
Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation