0% found this document useful (0 votes)

20 views36 pages

Lecture 3 Numerical Measures of Data

Uploaded by

nicklin0419

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views36 pages

Lecture 3 Numerical Measures of Data

Uploaded by

nicklin0419

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Lecture 3.

Numerical Measures of Data

AGEC 2001 Statistics I

Feng-An Yang1

1 Departmentof Agricultural Economics

National Taiwan University

Fall Semester

1/36
Outline
Measures of Location
Mean
Median
Mode
Shape of a distribution
Measures of Variation
Range
Variance and Standard Deviation
Coefficient of Variation
Grouped Data
Measures of Position
Percentile
Location of Percentile
Quartile and Decile
Box plot
2/36
Measures of Location

Measures of Location
Numerical measures used to describe the central tendency of the
data
I Common measures of location
I Mean
I Median
I Mode

3/36
Mean

Mean
A numerical average of a set of numbers
I Arithmetic Mean
I Weighted Mean
I Geometric Mean

Example
I The mean height of AGEC students is 172 cm.
I The mean weight of AGEC students is 55.3 kg.

4/36
Arithmetic Mean

Arithmetic Mean
Arithmetic mean is the simplest and the most widely used measure
of mean, and it is the sum of all the numbers in a dataset divided
by the number of observations in that dataset

Population Mean
N
P
xi
i=1
µ=
N

I µ is the population mean

I N is the number of observations
I Xi is the value of i-th observation

5/36
Arithmetic Mean

Sample Mean
n
P
xi
i=1
x̄ = n

I x̄ is the sample mean

I n is the number of observations in the sample

Example
{90,77,94,89,119,112,91,110,92,100,113,83}
n
P
xi
i=1 90+77+···+83 1,170
x̄ = n = 12 = 12 = 97.5

6/36
Arithmetic Mean

Properties of the Arithmetic Mean

I All values in the dataset are used in the calculation of mean
I The mean is unique
I The sum of the deviations from the mean is zero
n
(xi − x̄ ) = 0
P
i=1

Example
{3,7,5}, x̄ = 5
n
(xi − x̄ ) = (3 − 5) + (7 − 5) + (5 − 5) = 0
P
i=1

7/36
Arithmetic Mean

Properties of the Arithmetic Mean (cont’d)

I The mean can be affected by extreme values

Example
I A={1,2,3,4,5}, x̄A = 3
I B={1,2,3,4,100}, x̄B = 22

8/36
Median

Median
The midpoint of all values in a dataset

Steps for finding the median

I Sort the data in ascending (or descending) order
I In case of odd number of observations, the Median is on the
n+1
2 position
I Example: {11, 17, 25, 38, 60}. The median is 25
I In case of even number of observations, the Median is the
simple average of two middle numbers
25+38
I Example: {11, 17, 25, 38, 60, 65}. The median is 2 = 31.5

9/36
Median

Median
I The median is less sensitive to extreme values
I The median is unique

Example
I A={1,2,3,4,5}, x̄A = 3, median=3
I B={1,2,3,4,100}, x̄B = 22, median=3

10/36
Mode

Mode
The value of number that appears most often in a datset
I The mode is less sensitive to extreme values
I There may be multiple modes

Steps for finding the mode

I Organize the data and make a frequency table
I The mode is the value(s) with highest frequency

11/36
Mode

Example
{4,4,4,3,100,3,1,3,5,2,2,5,6,1,2,2,3,7,
1,3,7,8,1,4,7,5,2,2,5,1,1,3,3,1,2}

Value Frequency
1 7
2 7
3 7
4 3
5 4
6 1
7 3
100 2

I The modes are 1, 2, and 3

12/36
Shape of a distribution

Skewness
Skewness is a measure of the symmetry of a data distribution

1.5

0.4 Mode 0.4 Mode

Mean, Median, Mode
Median 1 Median

Mean Mean

0.2 0.2
0.5

0 0 0
−4 −2 0 −2 0 2 0 2 4

(a) Left-skewed: Mean < Median (b) Symmetric: Mean = Median (c) Right-skewed: Mean > Median

13/36
Measures of Variation
Measures of Variation
Numerical measures used to describe the spread of data
I Common measures of variation
I Range
I Variance and Standard Deviation
I Coefficient of Variation

Why study dispersion?

Measures of location, which describe central tendency of data, are
useful at that standpoint, but it tells noting about the variability of
data. Two data distributions can have the same central tendency
but quite different variability
0.3
0.2
0.1
0
0 2 4 6 8 10
x
14/36
Range

Range
The difference between the largest and the smallest values in a
dataset
Range = Maximum value - Minimum value

Example
{7,8,13,15,27,30}, Range=30-7=23

Issues
I It can be affected by extreme values
I {7,8,13,15,27,30}, Range=30-7=23
I {7,8,13,15,27,130}, Range=130-7=123
I It tells nothing about how data are distributed

15/36
Variance

Variance
The arithmetic mean of the squared deviations from the mean

Population Variance
N
P
(xi −µ)2
σ2 = i=1
N

I σ 2 is the population variance

I xi is the value of i-th observation
I µ is the population mean
I N is the number of observations in the population

16/36
Variance

Sample Variance
n
P
(xi −x̄ )2
s2 = i=1
n−1

I s 2 is the sample variance

I x̄ is the sample mean

Sample Standard Deviation

v
uPn
u (xi −x̄ )2
t
i=1
s= n−1

17/36
Variance

n
(xi − x̄ )2
P
2 i=1
s =
n−1
n
xi2 − 2xi x̄ + x̄ 2
P
i=1
=
n
n−1
n

P 2
xi − 2x̄ xi + nx̄ 2
P
i=1 i=1
=
n
n−1
P 2 2 2
xi − 2nx̄ + nx̄
i=1
=
n
n−1
P 2 2
xi − nx̄
i=1
=
n−1
18/36
Variance
Example

x x2 x − x̄ (x − x̄ )2
12 144 -5 25
20 400 3 9
16 256 -1 1
18 324 1 1
19 361 2 4
Total 1485 0 40

n
(xi − x̄ )2
P
i=1 40
s2 = = = 10
n
n−1
5−1
P 2
xi − nx̄ 2
i=1 1485 − 5 × 172
= = = 10
n−1 5−1
19/36
Variance

Properties of Variance
I Variance and standard deviation can never be negative
I Variance and standard deviation do not depend on the
location of data
I The more concentrated the data are, the smaller the variance
and standard deviation
I What if there is no variation in the data, i.e., all values are the
same?

0.2

0.1

0
−2 0 2 4 6 8 10 12
x
20/36
Empirical Rule

Empirical Rule
For a symmetrical, bell-shaped distribution, approximately 68%,
95%, and 99.7% of the observations lie within plus and minus one,
two, and three standard deviation of the mean, respectively
I Pr(µ − σ ≤ X ≤ µ + σ) ≈ 68%
I Pr(µ − 2σ ≤ X ≤ µ + 2σ) ≈ 95%
I Pr(µ − 3σ ≤ X ≤ µ + 3σ) ≈ 99.7%
68%

95%

99.7%

−3σ −2σ −1σ µ 1σ 2σ 3σ

21/36
Chebyshev’s Theorem

Chebyshev’s Theorem
For any set of observations (sample or population), the proportion
of values that lie within k standard deviations of the mean is at
least 1âĂŞ k12 , where k is any value greater than 1

Example
The average height of AGEC students is 170 cm and the
corresponding standard deviation is 10. At least what percent of
students lie within plus 3 and minus 3 standard deviations of the
mean? 1 − k12 = 1 − 312 = 1 − 19 ≈ 0.89

22/36
Coefficient of Variation

Coefficient of Variation (CV)

The coefficient of variation is a standardized measure of dispersion
of a data distribution, expressed as a percentage
I CV = x̄s × 100%
s is the sample standard deviation and x̄ is the sample mean
I It quantifies the variability relative to the mean and facilitates
the comparison of variability among data distributions with
different units or significantly different means

23/36
Coefficient of Variation
Example

Pollutant Mean Standard Deviation CV

PM2.5 100 Îĳg/m3 10 Îĳg/m3 10%
Ozone 50 ppm 10 ppm 20%

Relative to mean, the pollution of ozone is more variable than the

PM2.5

Example

Company Mean Production Standard Deviation CV

A 10000 10 0.1%
B 50 10 20%

Company A and B have the same variation in their production, but

company B is more variable relative to its production
24/36
Arithmetic Mean of Grouped data

Meann
P
f ×M
i=1
x̄ = n
I f is the frequency in each class
I M is the midpoint in each class

Example
Point Frequency (f ) Midpoint (M) f ×M
0-10 5 5 25
10-20 1 15 15
20-30 3 25 75
30-40 4 35 140
40-50 2 45 90
Total 15 345

n
P
f ×M
i=1 345
x̄ = n = 15 = 23

25/36
Standard Deviation of Grouped data

Standard
v Deviation
uPn
u f (M−x̄ )2
t
i=1
s= n−1

Example
Point Frequency (f ) Midpoint (M) f ×M (M − x̄ ) (M − x̄ )2 f (M − x̄ )2
0-10 5 5 25 -18 324 1620
10-20 1 15 15 -8 64 64
20-30 3 25 75 2 4 12
30-40 4 35 140 12 144 576
40-50 2 45 90 22 484 968
Total 15 345 3240

v
uPn
u f (M−x̄ )2
t q
i=1 3240
x̄ = n−1 = 14 = 15.21

26/36
Measures of Position

Measures of Position
Numerical measures used to divide data in equal parts
I Common measures of Position
I Quartile
I Decile
I Percentile

27/36
Percentile

Percentile
A percentile is a value indicating the percentage of observations in
a dataset fall below that value

Example
I The 87th percentile is 90 and it indicates that 87% of
observations are below 90

28/36
Location of Percentile

Steps for finding the pth percentile

I 1. Order the data in ascending order
I 2. Multiply p percent by the number of observations in the
data. Let’s call the resulting number as an index i
I 3. Check the index in Step 2.
I In case of a whole number, the pth percentile is the simple
average between the ith value and (i + 1)th value in the
ordered data
I Otherwise, round the index up to the nearest whole number.
The pth percentile is the dieth value in the ordered data

Note
There are some other ways to determine the percentile, such as
nearest-rank method, linear interpolation method

29/36
Location of Percentile

Example
{43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87,
88, 89, 93, 95, 96, 98, 99, 99}
I Suppose we want to find the 60th percentile. Index
i = 60/100 × 25 = 15
I The 60th percentile is then the simple average between the
15th value and 16th value
79+85
I P60 = 2 = 82

30/36
Location of Percentile

Example
{34, 42, 51, 65, 69, 74, 78, 84, 85, 85, 86, 87}
I Suppose we want to find the 80th percentile. Index
i = 80/100 × 12 = 9.6
I Since the index is not a whole number, we round it up to 10.
Then the the 80th percentile is at the 10th position in the
ordered data
I P80 = 85

31/36
Quartile and Decile

Quartiles
I The first quartile is called Q1 and it is equal to the 25th
percentile, indicting that 25% of observations are below it
I The second quartile is called Q2 and it is equal to the 50th
percentile. It is also simply the median that splits the data in
half
I The third quartile is called Q3 and it is equal to the 75th
percentile, indicting that 75% of observations are below it
I Interquartile range = Q3 − Q1

Deciles
In a similar fashion to Quartiles, Deciles are nine values that divide
the data into ten equal parts

32/36
Box plot

Box plot
I A box plot is a graphical representation of the distribution of
a data set
I It displays the median, quartiles, and potential outliers of the
data, providing a visual summary of its central tendency and
spread
I Also known as a box-and-whisker plot

33/36
Box plot
Components of a Box Plot
I Box
I The central box represents the interquartile range (IQR), which
includes the middle 50% of the data
I The edges of the box are the first quartile (Q1) and the third
quartile (Q3)
I Median Line
I A line inside the box represents the median (the 50th
percentile), which divides the data into two equal halves
I Whiskers
I Whiskers extend from the edges of the box to the minimum
and maximum values within a defined range, typically 1.5
times the IQR from Q1 and Q3
I They show the spread of the data outside the middle 50%
I Outliers
I Data points that fall outside the whiskers are considered
outliers and are often marked with individual points or symbols
34/36
Box plot
Min and Max as the boundary

I Let’s consider an example where we have exam scores for a

group of students
I 55, 60, 65, 70, 72, 75, 78, 80, 83, 85, 88, 90, 92, 95, 100
I Summaries
I Minimum: 55
I Q1 (First Quartile): 70
I Median (Q2): 80
I Q3 (Third Quartile): 90
I Maximum: 100

55 70 80 90 100

35/36
Box plot
1.5 IQR as the boundary

I 30,50,51,53,53,54,54,58,59,60,61,62,62,64,65,67,68,69,80,90
I Summaries
I Minimum: 30
I Q1 (First Quartile): 53.5
I Median (Q2): 60.5
I Q3 (Third Quartile): 66
I Maximum: 90
I Lower and upper bound
I Interquartile Range (IQR) = Q3 - Q1 = 66 - 54 = 12
I Lower Bound = 54 - 1.5 × 12 = 36
I Upper Bound = 66 + 1.5 × 12 = 84
I Outliers: 94

30 36 54 60.5 66 84 88 94
36/36

9t83b3382 Hoja de Datos
No ratings yet
9t83b3382 Hoja de Datos
4 pages
How To Compute Planetary Positions
100% (1)
How To Compute Planetary Positions
22 pages
1.4. Exact ODEs. Integrating Factors
No ratings yet
1.4. Exact ODEs. Integrating Factors
9 pages
DDDDDD 2
No ratings yet
DDDDDD 2
5 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Exploring Numerical Data - Students
No ratings yet
Exploring Numerical Data - Students
97 pages
Measures of Dispersion in Statistics
No ratings yet
Measures of Dispersion in Statistics
26 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Lecture 3 - Stat HO
No ratings yet
Lecture 3 - Stat HO
21 pages
History Reporting
No ratings yet
History Reporting
61 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Statistics Definitions & Examples
No ratings yet
Statistics Definitions & Examples
16 pages
Chapter 5 Statistics and Data
No ratings yet
Chapter 5 Statistics and Data
25 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Statistics for Data Analysis
No ratings yet
Statistics for Data Analysis
59 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
2 Measures of Location - Dispersion
No ratings yet
2 Measures of Location - Dispersion
61 pages
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
No ratings yet
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
26 pages
R3.Descriptive Statistics
No ratings yet
R3.Descriptive Statistics
5 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Descriptive Measures With Samples-1
No ratings yet
Descriptive Measures With Samples-1
33 pages
Statistics 1
No ratings yet
Statistics 1
10 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
Theory and Formula
No ratings yet
Theory and Formula
42 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
MMW PPT Weeks 9 12
No ratings yet
MMW PPT Weeks 9 12
31 pages
03 - Measures - of - Center - Variation
No ratings yet
03 - Measures - of - Center - Variation
45 pages
Lecture 4 Copy 1
No ratings yet
Lecture 4 Copy 1
13 pages
Data Analytics TB
No ratings yet
Data Analytics TB
1,944 pages
Lecture 5&6
No ratings yet
Lecture 5&6
15 pages
Basic 1
No ratings yet
Basic 1
60 pages
Statistics Unit1 Notes
No ratings yet
Statistics Unit1 Notes
11 pages
Measures of Central Tendency and Spread: Chapter 1, Section 2
No ratings yet
Measures of Central Tendency and Spread: Chapter 1, Section 2
36 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
Chapter 3 - Data Presentation
100% (1)
Chapter 3 - Data Presentation
40 pages
Chapt3 Overheads
No ratings yet
Chapt3 Overheads
8 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Lecture 3
No ratings yet
Lecture 3
10 pages
EDA W3 Obtaining-Data
No ratings yet
EDA W3 Obtaining-Data
57 pages
Lesson 3.2 Measures of Central Tendency Position and Variation
No ratings yet
Lesson 3.2 Measures of Central Tendency Position and Variation
62 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
14 pages
Introduction to Statistics and Data Types
No ratings yet
Introduction to Statistics and Data Types
57 pages
Statistical Measures 2024 (Part 2) - Word
No ratings yet
Statistical Measures 2024 (Part 2) - Word
8 pages
FDSA Unit 2
No ratings yet
FDSA Unit 2
44 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
Statistics for Students
No ratings yet
Statistics for Students
1 page
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Statistics Tutorial 1
No ratings yet
Statistics Tutorial 1
12 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Citric Acid-Production, Technology, Applications, Patent, Consultants, Company Profiles, Reports, Market
No ratings yet
Citric Acid-Production, Technology, Applications, Patent, Consultants, Company Profiles, Reports, Market
7 pages
Neha Dahiya - Content Submission (Patenting Life Forms and Gmo - Scope and Challenges For Intellectual Prope (4053)
No ratings yet
Neha Dahiya - Content Submission (Patenting Life Forms and Gmo - Scope and Challenges For Intellectual Prope (4053)
5 pages
PTY260S - Statistics Lecture 2019
No ratings yet
PTY260S - Statistics Lecture 2019
13 pages
Answer Key NID Latest Mock Test Papers - 23306606 - 2023 - 12 - 15 - 19 - 26
No ratings yet
Answer Key NID Latest Mock Test Papers - 23306606 - 2023 - 12 - 15 - 19 - 26
84 pages
Population Growth: BBC Learning English 6 Minute English
No ratings yet
Population Growth: BBC Learning English 6 Minute English
5 pages
How Glass Is Recycled
100% (1)
How Glass Is Recycled
2 pages
Alginate Cap
No ratings yet
Alginate Cap
6 pages
Role of Generational Equity in Environment Protection Final
No ratings yet
Role of Generational Equity in Environment Protection Final
7 pages
Clocking in Digital Systems
No ratings yet
Clocking in Digital Systems
28 pages
Series 100 Valves
No ratings yet
Series 100 Valves
36 pages
STP32537S Characterization of High Purity Cathodes For Plant Control
No ratings yet
STP32537S Characterization of High Purity Cathodes For Plant Control
30 pages
English Grammar: Fill-in-the-Blank Exercises
No ratings yet
English Grammar: Fill-in-the-Blank Exercises
2 pages
Geometry Exercises 2: Parallelogram Rule
No ratings yet
Geometry Exercises 2: Parallelogram Rule
2 pages
Cloud Module 1
No ratings yet
Cloud Module 1
8 pages
MAT301 Lecture Notes 2018version
No ratings yet
MAT301 Lecture Notes 2018version
99 pages
Roll Crushers PDF
No ratings yet
Roll Crushers PDF
5 pages
Lecture 3 (Week 2) : BIOLOGY 201/winter 2018 Dr. Ian Ferguson
No ratings yet
Lecture 3 (Week 2) : BIOLOGY 201/winter 2018 Dr. Ian Ferguson
5 pages
Liver Function Test and Renal Function Test-Final1
No ratings yet
Liver Function Test and Renal Function Test-Final1
8 pages
C1 Advanced Reading & Uoe Part 1 - Parrots Rustling
No ratings yet
C1 Advanced Reading & Uoe Part 1 - Parrots Rustling
3 pages
Fan Tool Kit - Ad Hoc Group - V4dd
No ratings yet
Fan Tool Kit - Ad Hoc Group - V4dd
121 pages
Semiconductor Field Service Expert
No ratings yet
Semiconductor Field Service Expert
2 pages
Pipe Support Span Chart
No ratings yet
Pipe Support Span Chart
1 page
High Quality Knitting in The Nordic Tradition Instant EPUB Download
0% (1)
High Quality Knitting in The Nordic Tradition Instant EPUB Download
15 pages
s1 Result Analysis
No ratings yet
s1 Result Analysis
4 pages
C - Diagnostic Testers 43 - Diagnostic Test - Ignition Coil Test (Only Bear Tester) All Engines
No ratings yet
C - Diagnostic Testers 43 - Diagnostic Test - Ignition Coil Test (Only Bear Tester) All Engines
1 page
8020 Blocked From Use: Tuesday
No ratings yet
8020 Blocked From Use: Tuesday
95 pages
6744-00-16-46-SP-09 Ra
No ratings yet
6744-00-16-46-SP-09 Ra
4 pages

Lecture 3 Numerical Measures of Data

Uploaded by

Lecture 3 Numerical Measures of Data

Uploaded by

Lecture 3.

Numerical Measures of Data

1 Departmentof Agricultural Economics

I µ is the population mean

I x̄ is the sample mean

Properties of the Arithmetic Mean

Properties of the Arithmetic Mean (cont’d)

Steps for finding the median

Steps for finding the mode

I The modes are 1, 2, and 3

0.4 Mode 0.4 Mode

Why study dispersion?

I σ 2 is the population variance

I s 2 is the sample variance

Sample Standard Deviation

−3σ −2σ −1σ µ 1σ 2σ 3σ

Coefficient of Variation (CV)

Pollutant Mean Standard Deviation CV

Relative to mean, the pollution of ozone is more variable than the

Company Mean Production Standard Deviation CV

Company A and B have the same variation in their production, but

Steps for finding the pth percentile

I Let’s consider an example where we have exam scores for a

You might also like