Chapter 2: Graphical Descriptions of Data
Objectives
Construct and interpret a frequency distribution.
Summarize data by the basic types of graphs, interpret results, detect
outliers, etc.
Identify suitable graphs for different type of variables.
Frequency Tables
• Divide a set of data into classes or intervals.
• A frequency table or frequency distribution is a table that shows the number
of data items in each class or interval.
• A frequency table is a method of organizing large sets of data.
Example 2.1: Attendance in an intro stat section on each day of a semester.
45, 47, 43, 40, 38, 36, 23, 35, 44, 26, 32, 35, 40, 38, 38, 39, 36, 37, 45, 35, 36, 37,
38, 36, 33, 35, 36, 40, 45
Class Limit # of days (tally) Class Limit Frequency
20 – 29 20 – 29
30 – 39 30 – 39
40 – 49 40 – 49
The frequency of a class, denoted by f, is the number of data points in it.
The class limit is a-b, a is the lower class limit and b is the upper class limit.
The class width a 2 – a1 is the difference value of the lower class limit (a 2) in
one interval minus the value of the lower class limit (a 1) in the next lower
interval.
a+b
The class midpoint of an interval is 2
.
Relative Frequency:
The relative frequency of a class is the proportion of all data values that fall into
that class. To find the relative frequency of a particular class, divide the class
frequency f by the total of all frequencies n (sample size).
Class Boundaries
• When dealing with integer only data we allot space between the classes.
• The halfway point between these intervals are the class boundaries.
• To find the upper class boundary, add 0.5 to the upper class limit.
• To find the lower class boundary, subtract 0.5 from the lower class limit.
• Example 2.2:
Class Class Midpoint Frequency Relative
Boundaries Frequency
20 – 29
30 – 39
40 – 49
Example 2.3: The reaction times (in milliseconds) of a sample of 40 adult females
to auditory stimulus is listed below. Construct a frequency distribution and
histogram of the data.
507 359 305 291 336 310 514 442 307 337
373 358 387 354 323 441 388 386 469 351
411 382 320 450 309 416 359 388 422 413
326 510 460 350 461 387 321 486 423 456
Class Class Tally Midpoint Frequency Relative
Limits Boundaries (f) Frequency
Steps to construct a frequency distribution:
Determine the data range
Choose number of classes (usually 5-10)
Choose class width
Determine class limits
Calculate class boundary, frequency, relative frequency, etc., for each class.
Histogram
• Histograms and relative-frequency histograms provide effective visual
displays of data organized into frequency tables.
• Horizontal axis: Classes/intervals
• Vertical axis: Frequencies or relative frequencies
• Note, histogram is useful for large data
Question: What type of data do histograms summarize?
Example 2.4: Draw a histogram using the frequency table in Example 2.3. Make
sure to put appropriate title and labels of the axes. Make your observation about
the shape of the distribution. Is there any outlier?
Example 2.5: The data represent the heights of 21 adults in centimeters.
172, 158, 167, 137, 190, 173, 173, 168, 178, 184, 165, 173, 179, 166, 168, 165,
150, 185, 140, 163, 165
(i) Construct a histogram
(ii) Draw a histogram and a relative-frequency histogram
(iii) Make a couple of observations; any outliers?
Graphical display
When we display data, different types of shapes can be observed. Few commonly
observed shapes are given below.
(i) Uniform
(ii) Bimodal
(iii) Bell-shaped
(iv) Skewed Right
(v) Skewed Left
Data presentation by graphs
What graphs or graphical display we use depends on the type of variables
(categorical or quantitative) we deal with. We have studied histogram; now we
are introduced with few more commonly used graphs as follows.
Graphs to summarize numerical data
(a) Stem and leaf plot:
More useful for small data sets
Data values are retained
Question: what type of data does it summarize?
Example 2.6: Consider the data: 12, 21, 22, 27, 33, 35, 36, 37, 40, 40. Construct a
stem and leaf plot. Make a couple of observation about the distribution. Any
outliers?
Assume, Stem = Tens, Leaf = ones
(b)Dot plot:
More useful for small data sets
Data values are retained
It is often useful to check whether or not a distribution is bell shaped or not
Question: what type of data does it summarize?
Questions
: Are there any
outliers? Which one?
Example 2.7: The scores for the first round of a golf tournament are indicated
below.
(i) Make a stem and leaf plot
(ii) Make a dot plot
62 71 73 75
62 71 73 75
65 71 73 75
66 71 73 75
67 72 73 77
70 72 74 79
71 72 74 80
71 72 74 84
Few graphs to summarize categorical data
(a)Pie chart:
This graph displays each category with respect to the total
Usually categories are compared by percentages
Example 2.8: In a recent survey, introductory STAT students were asked, “What’s
your favorite type of chocolate?” The data are summarized by the pie-chart:
Question 1: What information can you extract from the pie-chart above?
Question 2: What type of data does a pie-chart summarize? What type of variable
is “chocolate-type?”
(b) Bar graph:
Frequency for different categories are shown by bars
Easy to compare the categories
Example 2.9: Refer to the data in Example 2.8
Question 1: Write one observation about the bar graph.
Question 2: What type of data do bar graphs summarize?
Good graphing requirements
A meaningful title
Appropriate labels
Scale consistent with data
Note: The table summarizes what graphs are appropriate for categorical and
quantitative data.
Variable Graph
One quantitative variable Histogram, dot plot, stem and leaf plot,
boxplot, etc.
One categorical variable Pie chart, bar graph, etc.
One quantitative and one categorical Side by side histogram, side by side dot
variable plot, side by side stem and leaf plot, side
by side boxplot, etc.
Two categorical variables Side by side pie chart, side by side bar
graph
Two quantitative variables Scatter plot
(regression, time series, etc.)
Few examples using the survey-data in Example 2.8
(a) One categorical (gender) and one quantitative variable (commute in miles)
(i) Side by side histogram:
(ii) Side by side dot plot:
(iii) Side by side boxplot:
(b)Two categorical variables, residence and gender
(i) Residence by gender:
(ii) Side by side bar graph: