Unit 1
Class 2
Today’s Agenda
Last class:
• Course outline✓
• Motivating examples✓
Today:
• Data and outliers
• Distributions
• Bar chart and pie chart
• Histogram
• Stem and leaf plot
2
Introduction
Chapter 1
3
Data
Unit/individual – An object (what we are measuring)
e.g.,
Variable – A characteristic of a unit
e.g.,
4
Example Data:
Person Height Age Handedness
1 72’ 17 Left
2 68’ 20 Right
3 54’ 25 Left
5
Data
Types…
Qualitative Quantitative
Descriptive Numeric
Ordinal Nominal Discrete Continuous
Has an order (e.g., No order (e.g., Space b/w each
No space
letter grades) handedness) number
Binary: data only takes on two values
• e.g., If I flip a fair coin, I either get heads or tails
• e.g., The number of heads on one flip of a fair coin
6
Activity: Consider the students in this class as
the units in a statistical study.
Part 1: For each of the following variables, Part 2: Explain why the following questions
is the variable qualitative or quantitative? are NOT variables.
• How many classes are you taking • What percentage of students in
this term? this class were born on a Monday?
• On what day of the week were you • What is the average number of
born? science fiction books read by a
• How many science fiction books student in this class?
have you read?
• Do you consider yourself an “early Think: What would the units have to
bird” or a “night owl”? be for these questions to be
legitimate variables?
7
Data Notation
Unsorted Data: 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 , where 𝑛 is sample size
Sorted Data: 𝑚𝑖𝑛 = 𝑥(1) , 𝑥(2) , … , 𝑥 𝑛 = 𝑚𝑎𝑥
E.g.: 3, 16, 11, 2
8
Outliers
An extreme data value, relative to the rest of the data.
E.g.: 1, 2, 3, 4, 1000
E.g.: -1000, -4, -3, -2, -1
9
Outliers are due to…
• Typos
• Experimental error
• Random chance
10
Today’s Agenda
Today:
• Data and outliers✓
• Distributions
• Bar chart and pie chart
• Histogram
• Stem and leaf plot
11
Distributions (Dist’ns)
Chapter 1
12
Distributions (Dist’n)
A distribution tells us the frequency of the values of a variable.
Handedness of Frequency (Count) Relative Frequency
STAT Students (Percent)
Left 20 20
× 100 = 25%
80
Right 50 50
× 100 = 62.5%
80
Ambidextrous 10 10
× 100 = 12.5%
80
n = 80 (sample size) 13
Data and R: Downloading R & RStudio
You can download R from: https://cran.r-project.org/
You can install a free version of RStudio desktop from:
https://www.rstudio.com/products/rstudio/download/
Highly recommended: See Learn document about how to install R and
RStudio!
14
Data and R: Datasets
You can read in a dataset into RStudio.
This is done by going to ‘Session -> Set Working Directory -> Choose
Directory’ and selecting the folder in which you find your dataset.
Then, read in the data by typing:
data=read.table("c_grades.csv",sep=",",header=T)
To look at the dataset you can type:
data
15
Today’s Agenda
Today:
• Data and outliers✓
• Distributions✓
• Bar chart and pie chart
• Histogram
• Stem and leaf plot
16
Charts and Diagrams
Chapters 1-3
17
Plots: used to display our distributions and
datasets.
Qualitative (Data) Quantitative (Data)
• Bar Charts • Histograms
• Pie Charts • Stem and leaf plots
• Dot plots
• Boxplots
• Scatterplots
• Time Series plots
Will see in Unit 2
18
Qualitative Charts
19
Figure 1.3: The data on the
Bar Chart reported cases of
infectious diseases in
California for 2014. (a) The
bars are sorted according
to their height. (b) The bars
are grouped by disease
type.
Note: The order of the
categories is irrelevant.
A bar chart doesn’t tell
us about the shape
because of this!
20
Pie Chart
There is much debate about whether pie
charts are good or bad. Let’s watch a
video!
Doing a quick google search results in many
articles on the subject, such as:
• Pie Charts - Good, Bad, or Ugly?
• What is a pie chart?
What do you think? Do you like pie charts
or dislike them? Why?
An interesting point to consider: if pie
charts are bad, then why do we still
use/see them?
21
Today’s Agenda
Today:
• Data and outliers✓
• Distributions✓
• Bar chart and pie chart✓
• Histogram
• Stem and leaf plot
22
Quantitative Charts
23
Histogram
(like a bar chart)
How to Build a Histogram: Step 1
Grade Interval Count Percent Note:
D [50,60) 3 2.3 1. Interval, bin, and class
C [60,70) 28 21.5 used interchangeably.
2. [ ]/( )= include/exclude
B [70,80) 65 50
# in interval.
A [80,90) 31 23.8 • e.g., [50, 60) = every
A+ [90,100] 3 2.3 number from 50 to
Sum n = 130 100% 60, except 60
(with
rounding)
25
3. All intervals should have the same size. They should also have the
same type(s) of brackets on the left & right (last bin is exception, for
obvious reasons).
4. Build these bins to provide a nice “shape” for the dist’n.
26
How to Build a Histogram: Step 2
Plot it.
Note:
1) Grade axis should be the bins
2) Y axis is the count (frequency) or percentage (relative frequency)
27
In R: The Grades Dataset
#The dataset is 100 grades from a particular
#course
data=read.table("c_grades.csv",sep=",",header=T)
hist(data$Grade,breaks=c(40,50,60,70,80,90,100),
main="Hist of Grades",xlab="Grades")
28
In R: Note: No space
between the
columns! Cannot
logically move
[60,70) to end of
chart.
29
Today’s Agenda
Today:
• Data and outliers✓
• Distributions✓
• Bar chart and pie chart✓
• Histogram✓
• Stem and leaf plot
30
Stem and Leaf Plots
Steps:
1. Decide how to break your data
2. Add a legend
3. Draw the stems and leaves
32
Step 1: To draw a stem and leaf plot we take our data
and decide where to break it.
Example 1: 21, 23, 45, 46, 47, 50, 51, 54, 55, 56, 57, 60
In this case we would probably break it in the tens unit, i.e.
2|1, 2|3, 4|5, 4|6, 4|7, 5|0, 5|1, 5|4, 5|5, 5|6, 5|7, 6|0
Example 2: 0.21, 0.23, 0.45, 0.46, 0.47, 0.50, 0.51, 0.54, 0.55, 0.56, 0.57, 0.60
Here, we’d break it at the tenths unit, i.e.
2|1, 2|3, 4|5, 4|6, 4|7, 5|0, 5|1, 5|4, 5|5, 5|6, 5|7, 6|0
Example 3: 2100, 2300, 4500, 4600, 4700, 5000, 5100, 5400, 5500, 5600, 5700, 6000
Here, we’d break it at the thousands unit, i.e.
2|1, 2|3, 4|5, 4|6, 4|7, 5|0, 5|1, 5|4, 5|5, 5|6, 5|7, 6|0
33
Step 2: Give a legend. After deciding where to break it,
say where you did.
Example 1: 21, 23, 45, 46, 47, 50, 51, 54, 55, 56, 57, 60
2|1, 2|3, 4|5, 4|6, 4|7, 5|0, 5|1, 5|4, 5|5, 5|6, 5|7, 6|0
Legend: The decimal is one digit to the right of the |
Example 2: 0.21, 0.23, 0.45, 0.46, 0.47, 0.50, 0.51, 0.54, 0.55, 0.56, 0.57, 0.60
2|1, 2|3, 4|5, 4|6, 4|7, 5|0, 5|1, 5|4, 5|5, 5|6, 5|7, 6|0
Legend: The decimal is one digit to the left of the |
Example 3: 2100, 2300, 4500, 4600, 4700, 5000, 5100, 5400, 5500, 5600, 5700, 6000
2|1, 2|3, 4|5, 4|6, 4|7, 5|0, 5|1, 5|4, 5|5, 5|6, 5|7, 6|0
Legend: The decimal is 3 digits to the right of the |
34
Clicker Question
Given “Legend: The decimal is two digits to the right of the |” the
number 6|2 would be written as:
A) 0.062
B) 6.2
C) 62
D) 620
E) None of the above
35
Step 3: Put the stem on the left and attach the leaves to
the right in order from smallest to largest.
Example 1: 21, 23, 45, 46, 47, 50, 51, 54, 55, 56, 57, 60
2|1, 2|3, 4|5, 4|6, 4|7, 5|0, 5|1, 5|4, 5|5, 5|6, 5|7, 6|0
The decimal point is 1 digit(s) to the right of the |
2 | 13
Leaves
3 |
Stem
4 | 567
5 | 014567
6 | 0
36
Clicker Question
The decimal point is 1 digit(s) to the right of the |
2 | 13
3 |
4 | 57
5 | 01457
6 | 0
Given the stem and leaf plot above how many units are there?
A) 11
B) 10
C) 5
D) 15
E) None of the above
37
Today’s Agenda
Today:
• Data and outliers✓
• Distributions✓
• Bar chart and pie chart✓
• Histogram✓
• Stem and leaf plot✓
38
Homework Questions
• Chapter 1
• 1.15-1.19 (odds), 1.21, 1.33, 1.39a
• Before next class, you must complete the following:
• Learn about the mode, sample mean and median by
1. Reading Ch. 2 “Measures of Center: Median, Mean” (pg. 40-44) in the textbook, or
2. Watching this video (skip the trimmed mean section)
39