Chapter 2.
Descriptive Statistics: Tabular and
Graphical Displays
1
categorical
► There are two types of data sets:
and quantitative .
► Categorical data
.
use labels or names to identify categories of like items
► Quantitative data
are numerical values that indicate how much or how many
.
2
► Example 1:
Coca-cola Pepsi Pepsi Coca-
cola Sprite Coca-cola
Sprite Pepsi Coca-
cola Pepsi
► Example 2: Number of days required to audit
each account
10 5 2 9 8 11 3
describe the use of graphical 12
displays
4 7 to summarize data
or present information about a data set
► Data visualization is a term used to
.
3
4
2.1 Summarizing Data for a Categorical
Variable
Frequency, Relative Frequency, and Percent
Frequency Distributions
Category freq. relative f. percent f.
Coca-Cola
Pepsi
Sprite
Cumulative
Distributions
Category cum. freq. cum. rel. f. cum. percent f.
Coca-Cola
Pepsi
Sprite
5
2.1 Summarizing Data for a Categorical
Variable
Frequency, Relative Frequency, and Percent
Frequency Distributions
Category freq. relative f. percent f.
Coca-Cola 4 4/10 = .4 (0.4)(100) = 40%
Pepsi 4 4/10 = .4 (0.4)(100) = 40%
Sprite 2 2/10 = .2 (0.2)(100) = 20%
Cumulative
Distributions
Category cum. freq. cum. rel. f. cum. percent f.
Coca-Cola 4 .4 40%
Pepsi 8 .8 80%
Sprite 10 1.0 100%
6
► The relative frequencies add up to
1.
► The percent frequencies add up to
100%.
7
Histogram (= Bar
Chart)
relative frequency .
The y axis of a histogram is
Example 1:
Coca-cola Pepsi Pepsi Coca-
cola Sprite Coca-cola
Sprite Pepsi Coca-
cola Pepsi
8
Shapes of
Histograms
9
Pie
Chart
Draw a circle to represent all the data. Then subdivide
the circle into parts that correspond to the relative
frequency for each class.
10
Creating Frequency Distribution and Bar Chart
in Excel
Step 1: Select any cell in the data
set. Step 2: Click the Insert tab on
the ribbon. Step 3: Click
Recommended Charts.
Step 4: Click OK.
Step 5: Select the Frequency Distribution table and
click on the bar chart icon.
11
2.2 Summarizing Data for a Quantitative
Variable
Example 2: 10, 5, 2, 9, 8, 11, 3, 12, 4, 7
Frequency, Relative Frequency, and Percent
Frequency Distributions
Range Freq. Relative f. Percent f.
0-4 3 3/10 = .3 (0.3)(100) = 30%
5-9 4 4/10 = .4 (0.4)(100) = 40%
10-14 3 3/10 = .3 (0.3)(100) = 30%
Cumulative
Distributions
Range Cum. freq. Cum. r. f. Cum. p. f.
0-4 3 .3 30%
5-9 7 .7 70%
10-14 10 1.0 100%
12
Histogram (= Bar
Chart)
The y axis of a histogram isrelative frequency
.
Example 2: 10, 5, 2, 9, 8, 11, 3, 12, 4, 7
13
How to Create Frequency Distribution and Bar
Chart in Excel
Use
PivotTable.
softdrink.xlsx
Audit.xlsx
14
Dot Plot
A horizontal axis shows the range for the data. Each
data value is represented by a dot placed above the axis.
Example: 12, 15, 16, 13, 12, 14, 15, 12, 14
15
Stem–and–Leaf Display
► Arrange the leading digits of each data value to the
left of a vertical line.
► To the right of the vertical line, we record the last
digit for each data value.
112 97 107 92 86 126 128 118 127
124
104 92 108 96 100 92 115 91 102
81
16
Stem–and–Leaf Display
► Arrange the leading digits of each data value to the
left of a vertical line.
► To the right of the vertical line, we record the last
digit for each data value.
112 97 107 92 86 126 128 118 127
124
104 92 108 96 100 92 115 91 102
81
17
► Put the digits on each line in order.
► Use a rectangle to contain the leaves of each stem.
► Rotating this page counterclockwise onto its side
provides a picture that is similar to a histogram.
18
2.3 Summarizing Data for Two Variables Using
Tables
A crosstabulation is a tabular summary of data for two
variables. Quality rating and meal price data for 10 LA
restaurants
Restaurant Quality Rating Meal Price ($)
1 Good 18
2 Very Good 22
3 Good 28
4 Excellent 38
5 Very Good 33
6 Good 28
7 Very Good 19
8 Very Good 11
9 Very Good 23
10 Good 13
19
Meal Price
Quality $10– $20–29 $30–39
Rating 19 Total
Good
Very
Good
Excellen
t Total
20
Meal Price
Quality $10– $20–29 $30–39
Rating 19 Total
Goo 2 2 0 4
d
Very 2 2 1 5
Good
Excelle 0 0 1 1
nt
Tota 4 4 2 1
l 0
21
How to Create a Crosstabularion in
Excel
Quality Rating.xlsx
use pivot table
Count of Restaurant Column Labels
Row Labels 10-19 20-29 30-39 40-49 Grand Total
Good 42 40 2 84
Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
Grand Total 78 118 76 28 300
22
We obtain column percentages by dividing each
element in a particular column by the total for that
column.
Meal Price
Quality $10– $20–29 $30–39
Rating 19 Total
Goo 2 2 0 4
d
Very 2 2 1 5
Good
Excelle 0 0 1 1
nt
Tota 4 4 2 1
l 0
23
We obtain column percentages by dividing each
element in a particular column by the total for that
column.
Meal Price
Quality $10– $20–29 $30–
Rating 19 39
Goo 50 50 0
d % % %
Very 50 50 50
Good % % %
Excelle 0 0 50
nt % % %
Tota 100 100 100
l % % %
24
We obtain row percentages by dividing each element in a
particular row by the total for that row.
Meal Price
Quality $10– $20–29 $30–39
Rating 19 Total
Goo 2 2 0 4
d
Very 2 2 1 5
Good
Excelle 0 0 1 1
nt
Tota 4 4 2 1
l 0
25
We obtain row percentages by dividing each element in a
particular row by the total for that row.
Meal Price
Quality $10– $20–29 $30– tota
Rating 19 39 l
Goo 50 50 0%
d % % 100%
Very 40 40 20%
Good % % 100%
Excelle 0 0 100 100
nt % % % %
26
Simpson’s Paradox
Simpson's paradox occurs when conclusions from
separate crosstabulations are reversed when the
data is aggregated.
Judge
Verdict Luckett Kendall Total
Upheld 129 (86%) 110 (88%) 239
Reversed 21 (14%) 15 (12%) 36
Total (%) 150 (100%) 125 (100%) 275
Which judge performed better overall ?
Judge Luckett Judge Kendall
Municipal
Verdict Common Pleas Municipal Court Total Verdict Common Pleas Total
Court
Upheld 29 (91%) 100 (85%) 129 Upheld 90 (90%) 20 (80%) 110
Reversed 3 (9%) 18 (15%) 21 Reversed 10 (10%) 5 (20%) 15
Total (%) 32 (100%) 118 (100%) 150 Total (%) 100 (100%) 25 (100%) 125
what caused the paradox ???
27
2.4 Summarizing Data for Two Variables Using
Graphical Displays
A scatter diagram is graphical representation of
two quantitative variables . A trendline is a line that
provides an approximation of that relationship.
Example:
No. of Weekly Sales (in thousands of
Commercials dollars)
2 50
5 57
3 54
Scatter
Diagram:
28
Shapes of Scatter
Diagrams
1. Study hours vs test score (P)
2. Demand vs price (P)
3. Supply vs price (N)
4. Age and test score (NA)
29
Side-by-Side Bar Chart
A side-by-side bar chart is a graphical display for
depicting multiple bar charts on the same display.
Meal Price
Quality $10– $20–29 $30–$40–
Rating 19 39 49
Goo 2.6 78.6
d % 11.9% 36.8% %
Very 53.8
Good % 33.9%
Excelle 43.6 21.4
2.6% 0%
nt % %
Tota 100 100
54.2%
l % %
60.5%
100%
100%
30
31
Stacked Bar Chart
A stacked bar chart is a bar chart in which each bar is
broken into rectangular segments of a different color
showing the relative frequency of each class.
Meal Price
Quality $10– $20–29 $30–$40–
Rating 19 39 49
Goo 2.6 78.6
d % 11.9% 36.8% %
Very 53.8
Good % 33.9%
Excelle 43.6 21.4
2.6% 0%
nt % %
Tota 100 100
54.2%
l % %
60.5%
100%
100%
32
33
How to Create Side-by-Side and Stacked Bar
Charts in Excel
Quality Rating.xlsx
use pivot table
34
Choosing the Type of Graphical Display
Group the following charts into three
categories:
► Displays Used to Show the Distribution
of Data
► Displays Used to Make Comparisons
► Displays Used to Show Relationships
Bar
Chart
Pie
Chart
Dot Plot
Histogra
m
Stem-and-Leaf
Display Side-by- 35
Choosing the Type of Graphical Display
Group the following charts into three categories:
► Displays Used to Chow the Distribution of Data
Bar Chart, Pie Chart, Dot Plot, Histogram, Stem-
and-Leaf Display
► Displays Used to Make Comparisons
Side-by-Side Bar Chart, Stacked Bar Charts
► Displays Used to Show
Relationships Scatter diagram,
Trendline
36