Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views14 pages

Chapter 2 - Frequency Dist

Uploaded by

minhajulobia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

Chapter 2 - Frequency Dist

Uploaded by

minhajulobia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CHAPTER 2

DESCRIBING DATA: FREQUENCY DISTRIBUTIONS AND GRAPHIC


PRESENTATION

Introduction

This chapter begins our study of descriptive statistics. Recall from Chapter 1 that when
using descriptive statistics we merely describe a set of data. For example, we want to
describe the entry-level salary for a select group of professions. We find that the entry-level
salary for accountants is $38,000, for systems analysts $48,000, for infectious disease
specialists $80,000, and so on. This unorganized data provides little insight into the pattern
of entry-level salaries, which makes conclusions difficult.

This chapter presents a technique that is used to organize raw data into some meaningful
form. It is called a frequency distribution. Then, to better understand the main features of
the data, we portray the frequency distribution in the form of a frequency polygon, a
histogram, or a cumulative frequency distribution.

Frequency Distributions

A frequency distribution is a useful statistical tool for organizing a mass of data into some
meaningful form.

Frequency Distribution: A grouping of data into mutually exclusive classes showing the
number of observations in each.

As noted, a frequency distribution is used to summarize and organize large amounts of data.

The steps to follow in developing a frequency distribution are:

1. Decide on the number of classes. Length of Service (in years)

2. Determine the class interval or width. 4 332 210 10 6 6 6 6


5 8 4 8 4
3. Set the individual class limits.
6 2 3 3 7 5
4. Tally the observations into the appropriate classes
and count the number of tallies (items) in each class.
10

As an example, the lengths of service, in years, of a sample of seventeen employees are given
above.

The seventeen observations are referred to as Frequency Distribution


raw data or ungrouped data. To organize Lengths of service Tallies Number of
the lengths of service into a frequency employees
distribution:
1 up to 3 years // 2

1. We decide to have five classes. 3 up to 5 years //// / 6

5 up to 7 years //// 5
2. We used a class width of 2.
7 up to 9 years /// 3
3. We used classes 1 up to 3, 3 up to 5,
9 up to 11 yrs. / 1
and so on.
Total 17
4. Tally the lengths of service into the
appropriate classes.

5. Count the number of tallies in each class as shown.

How many classes should there be? A common guideline is from 5 to 15. Having too few or
too many classes gives little insight into the data. A rule for determining the number of
classes is shown on the next page. The size of the class interval may be a value such as 3, 5,
10, 15, 20, 50, 100, 1,000, and so on.

Class Interval: The size or width of the class.

The class interval can be approximated by text formula [2-1]

H L
Class Interval i  2  1
k
Where:
i is the class interval.
H is the highest observed value.
L is the lowest observed value.
k is the number of classes.

If we apply the formula to our example, then H = 10, L = 2, and k = 5. We get a class interval
of 2,

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
11

10  2 8
found by: i   1.6 which is rounded to 2.
5 5

Each class has a lower class limit and an upper class limit. The lower limit of the first class is
usually slightly below the smallest value and is a multiple of the class interval.

In the previous example, the smallest number of years of service is 2. Therefore, we selected
1, which is slightly below 2, as the lower limit of the first class. The lower limit of the second
class is 3 years, and so on.

The number of tallies that occurs in each class is called the class frequency.

Class frequency: The number of observations in each class.

In the example, the class frequency of the lowest class is 2. For the next higher class it is 6.
The class midpoint divides a class into two equal parts.

Class midpoint: The point halfway between the upper and lower limit of a class.

The class midpoint is computed by adding the lower limit of consecutive classes and
dividing the result by two. Note that the class midpoint is also called the class mark.

In the example, the class midpoint of the 5 up to 7 class is 6 found by (5 + 7)/2. The class
interval is the distance between the lower limit of two consecutive classes. It is 2, found by
subtracting 1 (the lower limit of the first class) from 3 (the lower limit of the second class).

Suggestions on Constructing Frequency Distributions

When constructing frequency distributions, follow these guidelines:

1. The class intervals used in the frequency distribution should be equal. Unequal class
intervals present problems in graphically portraying the distribution. However, in
some situations unequal class intervals may be necessary in order to avoid a large
number of empty classes.

2. Text formula [2-1] above is based on the number of classes, and is useful for
determining the class interval.

highest value  lowest value H L


Class Interval(i)  or i  [2  1]
number of classes k

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
12

3. Your professional judgment can determine the number of classes. Too many classes or too
few classes might not reveal the basic shape of the distribution. A general rule is that it
is best to use at least 5 and not more than 15 classes when constructing a frequency
distribution.

4. The “2 to the k rule” is also used to determine the 2 to the k Rule for Number of
number of classes. To estimate the number of Classes
classes we select the smallest integer (whole Total Number Recommended
number) such that 2 k  n where n is the total of Number of
number of observations. Suppose a set of data has Observations Classes
60 observations. If we try k = 5, we get 25  32 , 9 – 16 4
which is less than 60, so we try 2  64 , which is
6
17 – 32 5
greater than 60. Thus the recommended number 33 – 64 6
of classes is 6. The table is based on the “2 to the k 65 – 128 7
rule.” 129 – 256 8

5. The lower limit of the first class should be an even 257 – 512 9
multiple of the class interval. Suppose a sample of 513 – 1,024 10
weight losses ranged from 25 pounds to 64
pounds. We want to organize the weight losses into a frequency distribution with an
interval of 6 pounds. The lower limit of the first class would be 24, found by
multiplying 4, the even multiple, by 6, the class interval. Obviously this suggestion
was not followed in the above example. Keep in mind that these are only suggestions
not rules.

6. Avoid overlapping stated class limits. Class limits such as 4-6 and 6-8 should not be used.
Use 4 up to 6, then 6 up to 8. This way you can determine in which class to tally 6.

7. Try to avoid open-ended classes. Open-ended classes cause serious graphing problems
and make it difficult to calculate various measures.

Relative Frequency Distribution

It is often helpful to know what percent the class frequencies are of the total number of
observations.

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
13

Relative class frequency: Shows what percent each class is of the total number of
observations (frequencies).

The relative class frequency is found by dividing each of the class frequencies by the total
number of frequencies.

Using the distribution of the Relative Frequency Distribution


lengths of service of the seventeen Length of Number
employees, the relative frequency service (in of Relative Found
for the 1 up to 3-year class is 0.1176 years) employees Frequency by
found by 2/17 = 0.1176 = 12%.
1 up to 3 years 2 0.1176 2/17
Thus 12% of the employees had 1
3 up to 5 years 6 0.3529 6/17
up to 3 years of service.
5 up to 7 years 5 0.2941 5/17

The relative frequencies for the 7 up to 9 years 3 0.1765 3/17

remaining classes are shown. 9 up to 11 1 0.0588 1/17


years
Total 17 0.9999

Graphic Presentation of a Frequency Distribution

To get reader attention a frequency distribution is often portrayed graphically as a


histogram, a frequency polygon and the cumulative frequency polygon.

Histogram

The simplest type of a statistical chart is called a histogram.

Histogram: A graph in which the classes are marked on the horizontal axis and the class
frequencies on the vertical axis. The class frequencies are represented by the heights of
the bars and the bars are drawn adjacent to each other.

For the length of service for the sample of


seventeen employees a histogram would Histogram
appear as shown on the right.
8
Employees
Number of

Note that to plot the bar for the 5 up to 7 6


years (which has a midpoint of 6 years), we 4
drew lines vertically from 5 and from 7 years 2
to 3 employees on the Y-axis and then 0
connected the end points by a straight line. 0 2 4 6 8 10 12
Chapter 2
Length of Service (years)
Describing Data: Frequency Distributions
and Graphic Presentation
14

The histogram provides an easily interpreted visual representation of a frequency


distribution.

Frequency Polygon

A second type of chart used to portray a frequency distribution is the frequency polygon.

Frequency Polygon: A graph that consists of line segments connecting the points formed
by the intersection of the class midpoints and the class frequency.

For the frequency polygon, the assumption is that the


Frequency Polygon
observations in any class interval are represented by the
class midpoint. A dot is placed at the class midpoint 8
7
opposite the number of frequencies in that class. For the
6

Class Frequency
distribution of years of service, make the first plot by 5
selecting 2 years on the X-axis (the midpoint) and then go 4

vertically on the Y-axis to 1 and place a dot. This process 3


2
is continued for all classes. Then connect the dots in
1
order. 0
0 2 4 6 8 10 12
Normal practice is to anchor the frequency polygon to the Years of Service

X-axis. This is accomplished by extending the lines to the


midpoint of the class below the lowest class (0) and to the midpoint of the class above the
highest class (12).

Cumulative Frequency Distribution

A cumulative frequency polygon reports the number and percent of observations that occur
less than a given value.

Cumulative Frequency Polygon: A graph that shows the number of observations below a
certain value.

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
15

Before we can draw a cumulative Cumulative Frequency Distribution


frequency polygon, we must convert Length of Class Cumulativ Found
the frequency distribution to a service (in Frequency e by
cumulative frequency distribution. years) Frequency
To construct a cumulative frequency
1 up to 3 years 2 2 2
distribution, we add the frequencies
3 up to 5 years 6 8 6+2
from the lowest class to the
5 up to 7 years 5 13 8+5
frequency of the next highest class.
7 up to 9 years 3 16 13 + 3
We add this sum to the frequency of
9 up to 11 1 17 16 + 1
the next class, etc.
years

The cumulative frequencies are plotted on the vertical axis (Y-axis) and the lengths of service
on the X-axis.

Cumulative Frequency Polygon

20 100%
18 90%
Cumulative Frequency

16 80%
14 70%
12 60%
Percent

10 50%
8 40%
6 30%
4 20%
2 10%
0 0%
0 2 4 6 8 10
Years of Service

It may be helpful to plot the cumulative frequencies on the left side of the vertical axis and
the percent of the total on the right side as shown in the polygon above.

The range of values is from $6 to $42. The first digit of Leading Digit Trailing

each number is the stem and the second digit is the Digit

leaf. The first customer (upper left) spent $12. Hence, 0 6

the stem value is 1 and the leaf value is 2. The 1 278

completed display after each trailing digit is arranged 2 2468

from low to high is shown. 3 246


4 2

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
16

Other Graphical Techniques

Several other charts are


Microsoft Revenue
discussed in this section.
$30,000
Each is designed to
$25,000
emphasize certain

Millions
$20,000
$15,000
characteristics in the data. $10,000
$5,000
The simple line chart
$0
displays information over 1985 1987 1989 1991 1993 1995 1997 1999 2001
Fiscal Year
a period of time. Time is
always scaled on the horizontal axis. In the line chart a line connects the values for various
periods.

As an example, shown is a line chart illustrating the revenue for the Microsoft Corporation
from the years 1985 through 2001. The revenue ranged from about $140 million in 1985 to
$25 billion in 2001.

The bar chart is often used to display categories. For a bar chart, bars represent the data for
each period. The bars can be shown as vertical or horizontal bars. As an example, shown is
the prime lending rate for the present (May, 2002), six months ago, and a year ago.

Prime Lending Rate

8.00%
7.00%
6.00%
5.00%
4.00%
3.00%
2.00%
1.00%
0.00%
Now 6 mos. Ago Year Ago

Note that bar charts and histograms, discussed earlier, both used rectangles to represent the
data. The difference between the two graphs is that the bars in a histogram touch each other
because the data is continuous.

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
17

Another popular chart is a pie chart. Its purpose is Tax Dollar Distribution
to show the relative comparison between parts of a Category Percent
total. Suppose we want to show where our tax
Roads 20
dollar goes.
Education 40
Welfare 15
Salaries 18
Miscellaneous 7

Tax Dollar Spending

Welfare Salaries
15% 18%

Miscel.
7%

Education Roads
40% 20%

After drawing a circle (pie) we put 0 on the top and go around the circle in increments of 5.
To plot the percent going for roads we draw a line from 0 to
the center of the circle and another line from the center to 20. Then, 20 + 40 = 60. This area
represents the amount going for education. This process is continued for the remaining
items.

Chapter Problems

Problem 2.1

A sample of 30 homes sold during the past year by Gomminger Realty Company was
selected for study. (Selling price is reported in thousands of dollars.)

$76 $74 $71 $78 $80 $67


80 82 67 88 72 78
85 76 84 82 83 80
72 82 77 79 86 89
Chapter 2
69 70 82 86 78 77Describing Data: Frequency Distributions
and Graphic Presentation
18

i) Organize these data into a frequency distribution and interpret your results.

ii) Based on the information from Gomminger Realty in Problem i), develop a histogram.

iii) Based on the information from Problems i) and ii), construct a frequency polygon.

iv) Construct a cumulative frequency polygon.

Answer: i) Class limits Class Cumulative


($000) Frequency Frequency
$65 up to $70 3 3
$70 up to $75 5 8
$75 up to $80 8 16
$80 up to $85 9 25
$85 up to $90 5 30

ii)

Histogram

10 9
8
8 $65 up to $70
Frequency

$70 up to $75
6 5 5
$75 up to $80
4 3
$80 up to $85
2 $85 up to $90
0
Class

iii)

Frequency Polygon

10
Frequency

8
6
4
2
0
67.5 72.5 77.5 82.5 87.5
Mid values

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
19

iv)

Cumulative Frequency Curve


35

30

Cumulative Frequency
25

20

15

10

0
$65 up $70 up $75 up $80 up $85 up
to $70 to $75 to $80 to $85 to $90
Class boundary

Problem 2.2

The percent of disposable income (disposable income is the amount of income left after
taxes) spent for groceries for the period from 1975 to 2000 is shown below. Draw a line chart
to depict the trend.

Disposable Income spent on groceries


Year
1975 13.0%
1980 12.3%
1985 11.2%
1990 10.1%
1995 9.5%
2000 9.3%

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
20

Answer:

Line Chart
14.00%

% of Disposable
12.00%
10.00%

Income
8.00%
6.00%
4.00%
2.00%
0.00%
1970 1975 1980 1985 1990 1995 2000 2005
Year

Problem 2.3

The purpose of home equity loans by the Home Bank and the percent of each type of loan
relative to the total are shown. Portray the home equity loans information in the form of a
pie chart.
Percent Cumulative
Loan Purpose Of Total Percent
Home improvement 32 32
Debt consolidation 30 62
Car purchase 11 73
Education 10 83
Other 9 92
Investments 8 100
Answer:

% of Total Loan Purpose


8
9
32
10

11

30

Home improvement Debt consolidation


Car purchase Education
Other Investments

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
21

Exercise 2.1

The Jansen Motor Company has developed a new engine to further reduce gasoline
consumption. The new engine was put in 25 mid-sized cars and the number of miles per
gallon recorded (to the nearest mile per gallon).

29 32 20 30 39
27 28 21 36 20
27 18 32 37 29
30 23 25 19 30
33 24 18 23 34

i) Develop a frequency distribution. Use a class interval of 5, with 15 as the lower


limit of the first class.

ii) Use the Jansen Motor Company data in Exercise 2.1 to construct a histogram.

iii) Use the Jansen Motor Company data in Exercise 2.1 to construct a frequency
polygon.

iv) Construct a cumulative frequency polygon.

Exercise 2.2
The expenditures on research and development for the Hennen Manufacturing Company
are given. Construct a simple line chart. Also construct a simple bar chart.

Year Expenditure (in $000)


1995 94
1996 103
1997 115
1998 145
1999 175
2000 203
2001 190

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation
22

Exercise 2.3
Refer to Problem 2.2. Develop a simple bar chart for the percent of disposable income spent
for groceries.

Exercise 2.4
The data depicts new cars sold in the United States during the year, classified by
manufacturer. Portray the “new cars sold” data in the form of a pie chart.
Cars Sold
Manufacturer Millions
General Motors 3100
Ford 1900
DaimlerChrysler 800
Toyota 800
Honda 800
Nissan 500
Other 1100
Total 9000

Chapter 2
Describing Data: Frequency Distributions
and Graphic Presentation

You might also like