0% found this document useful (0 votes)

10 views43 pages

Summarizing Data

Chapter Three discusses methods of summarizing data, focusing on measures of central tendency such as mean, median, and mode, as well as measures of variation like range and standard deviation. It highlights the importance of understanding these statistics for effective data analysis, including their advantages and disadvantages. Additionally, it covers concepts like skewness and the use of box-and-whisker plots to visualize data distribution.

Uploaded by

eldana.endale77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views43 pages

Summarizing Data

Uploaded by

eldana.endale77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

CHAPTER THREE

SUMMARIZING DATA
Mengistu Y.

4/25/2025
Objectives
• At the end of this lesson, the student will be able to:
• Identify the different methods of data summarization
• Compute appropriate summary values for a set of data
• Appreciate the properties and limitations of summary
values

4/25/2025
Summary Measures

Describing Data Numerically

Central Tendency Variation Shape

Arithmetic Mean Range Skewness

Median Interquartile Range

Mode Variance

Geometric Mean Standard Deviation

Quartiles Coefficient of Variation

4/25/2025
MEASURES OF CENTRAL TENDENCY

• The tendency of statistical data to get concentrated at

certain values is called the “Central Tendency or
average”
• Mean
• Median
• Mode

4/25/2025
The Arithmetic Mean or simple Mean
•The mean is the average of the numbers. It
is add up all the numbers, then divide by
how many numbers there are
• It is written in statistical terms as:

4/25/2025
• Example 1: What is the Mean of these numbers? 6, 11, 7
• Add the numbers: 6 + 11 + 7 = 24
• Divide by how many numbers (there are 3 numbers): 24 / 3 = 8
• The Mean is 8
Why Does This Work?
• It is because 6, 11 and 7 added together is the same as 3 lots of 8:
• It is like you are "flattening out" the numbers.

4/25/2025
Example 2
Birth weights(gm) of all live
born infant born at a private What is the arithmetic mean
hospital in a city, during a 1- for the sample birth weights?
week period.

4/25/2025
Weighted Mean
•When averaging quantities, it is often necessary
to account for the fact that not all of them are
equally important in the phenomenon being
described.

•In order to give quantities being averaged there

proper degree of importance, it is necessary to
assign them relative importance called weights,
and then calculate a weighted mean.
4/25/2025
•The weighted mean of a set
of numbers X1, X2, … and Xn,
whose relative importance is
expressed numerically by a
corresponding set of
numbers w1, w2, … and wn, is
given by

4/25/2025
• Example: In a given drug shop four different drugs were sold for unit
price of 60, 85, 95 and 50 birr and the total numbers of drugs sold
were 10, 10, 5 and 20 respectively. What is the average price of the
four drugs in this drug shop?
• Solution: for this example we have to use weighted mean using
number of drugs sold as the respective weights for each drug's price.
Therefore, the average price will be: 65 birr
• If we don't consider the weights, the average price will be 72.5 birr
𝟔𝟎∗𝟏𝟎+𝟖𝟓∗𝟏𝟎+𝟗𝟓∗𝟓+𝟓𝟎∗𝟐𝟎
Weighted mean= =65
𝟏𝟎+𝟏𝟎+𝟓+𝟐𝟎

4/25/2025
Weighted Mean
• We can also calculate a weighted mean using some weighting
factor:
e.g. What is the average income of all
n

w x
people in cities A, B, and C :
City Avg. Income Population
i i
x
A $23,000 100,000
i 1
n B $20,000 50,000

w
i 1
i
C $25,000 150,000

Here, population is the weighting factor and the average

income is the variable of interest

4/25/2025
Geometric Mean
• The Geometric Mean is a special type of average where we multiply
the numbers together and then take a square root (for two numbers),
cube root (for three numbers) etc.
Example: What is the Geometric Mean of 2 and 18?
• First we multiply them: 2 × 18 = 36
• Then (as there are two numbers) take the square root: √36 = 6

• Geometric Mean of 2 and 18 = √(2 × 18) = 6

• It is like the area is the same!

4/25/2025
Example: What is the Geometric Mean of 10, 51.2 and 8?
• First we multiply them: 10 × 51.2 × 8 = 4096
• Then (as there are three numbers) take the cube root: 3√4096 = 16
• For n numbers: multiply them all together and then take the nth
root (written n√ )

• Geometric Mean = 3√(10 × 51.2 × 8) = 16

• It is like the volume is the same:

4/25/2025
Characteristics of mean
• The value of the arithmetic mean is determined by every
item in the series.
• It is greatly affected by extreme values.
Advantages
• It is based on all values given in the distribution.
• It is most easily understood.
• It is most amenable to algebraic treatment.

4/25/2025
Disadvantages
• It may be greatly affected by extreme items and its
usefulness as a “Summary of the whole” may be
considerably reduced.
• When the distribution has open-ended classes, its
computation would be based assumption, and therefore may
not be valid.

4/25/2025
Median
•Suppose there are n observations in a sample. If
these observations are ordered from smallest to
largest, then the median is defined as follows:
•The sample median is

4/25/2025
Example 2
2.2. Consider the following
2.1. Compute the sample data, which consists of white
median for the birth weight blood counts taken on
data in example 1. admission of all patients
entering a small hospital on a
given day. Compute the
median white-blood count
(103).
7, 35,5,9,8,3,10,12,8

4/25/2025
i) Characteristics of Median
• It is an average of position/location .
• It is affected by the number of items than by extreme values.

ii) Advantages
• It is easily calculated and is not much disturbed by extreme
values
• It is more typical of the series
• The median may be located even when the data are
incomplete, e.g, when the class intervals are irregular and the
final classes have open ends.

4/25/2025
iii) Disadvantages
• it is determined mainly by the middle points in a
sample and is less sensitive to the actual numerical
values of the remaining data points.
• It is not so generally familiar as the arithmetic mean

4/25/2025
Mode
• It is the value of the observation that occurs with the greatest
frequency.
• A particular disadvantage is that, with a small number of
observations, there may be no mode.
• In addition, sometimes, there may be more than one mode
such as when dealing with a bimodal (two-peak) distribution.
• Find the modal values for the following data
a) 22, 66, 69, 70, 73. (No modal value)
b) 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0 kg)

4/25/2025
Mode
Characteristics
• It is an average of position
• It is not affected by extreme values
• It is the most typical value of the distribution
Advantages
• Since it is the most typical value it is the most descriptive
average
• Since the mode is usually an “actual value”, it indicates the
precise value of an important part of the series.
4/25/2025
Disadvantages:-
• Unless the number of items is fairly large and the
distribution reveals a distinct central tendency, the mode has
no significance
• It is not capable of mathematical treatment
• In a small number of items the mode may not exist.

4/25/2025
Skewness:
• If extremely low or extremely high observations are present in a
distribution, then the mean tends to shift towards those scores.
Based on the type of skewness, distributions can be:
• Negatively skewed distribution: occurs when majority of scores are
at the right end of the curve and a few small scores are scattered at
the left end.
• Positively skewed distribution: Occurs when the majority of scores
are at the left end of the curve and a few extreme large scores are
scattered at the right end.
• Symmetrical distribution: It is neither positively nor negatively
skewed. A curve is symmetrical if one half of the curve is the mirror
image of the other half.

4/25/2025
Skewness…
• Data can be "skewed", meaning it tends to have a long tail on one
side or the other:

• Negative Skew?
• Why is it called negative skew? Because the long "tail" is on the
negative side of the peak.
• The mean is also on the left of the peak.
4/25/2025
Skewness…
The Normal Distribution has No Skew
A Normal Distribution is not skewed.
It is perfectly symmetrical.
And the Mean is exactly at the peak.

4/25/2025
Skewness…
Positive Skew
And positive skew is when the long tail is on the
positive side of the peak, and some people say it
is "skewed to the right".
The mean is on the right of the peak value.

4/25/2025
Skewness…

4/25/2025
Measures of Dispersion
• Which of the
distributions of scores
has the larger 125

dispersion? 100
75
50
25

The upper distribution 0

1 2 3 4 5 6 7 8 9 10

has more dispersion

because the scores 125

100

are more spread out 75

0
1 2 3 4 5 6 7 8 9 10

4/25/2025
Measures of Dispersion

• How “spread out” the numbers are about the centre?

• Consider the following data sets:
Mean
Set 1: 60 40 30 50 60 40 70 50
Set 2: 50 49 49 51 48 50 53 50

• The two data sets given above have a mean of 50, but obviously set 1 is
more “spread out” than set 2 how do we express this numerically?
• Some of the commonly used measures of dispersion (variation) are: Range,
inter quartile range, quartiles, percentiles, variance, standard deviation and
coefficient of variation.
4/25/2025
Range and Interquartile Rage
• Range
• Simplest and the crudest measure of variation
• Difference between the largest and the smallest observations: Range =
Xlargest – Xsmallest
• Ignores the way in which data are distributed
• It wastes information for it takes no account of the entire data.
• Sensitive to outliers
• Interquartile Range
• Eliminate some high- and low-valued observations and calculate the range
from the remaining values
• Interquartile range = 3rd quartile – 1st quartile
= Q 3 – Q1
4/25/2025
Quartiles and Percentiles

• The quartiles divide the distribution into four equal parts.

• Deciles: If data is ordered and divided into 10 parts, then cut points
are called Deciles

• Percentiles: If data is ordered and divided into 100 parts, then cut
points are called Percentiles

4/25/2025
Quartiles
• The 25th percentile is When we wish to find the
often referred to as the quartiles for a set of data, the
first quartile and denoted following formulas are used
Q1.
• The 50th percentile (the
median) is referred to as
the second or middle
quartile and written Q2’
and
• the 75th percentile is
referred to as the third
quartile, Q3.

4/25/2025
Using the Five-Number Summary to Explore the Shape
• Box-and-Whisker Plot: A Graphical display of data using 5-number
summary:

Minimum, Q1, Median, Q3, Maximum

• The Box and central line are centered between the endpoints if data
are symmetric around the median

Min Q1 Median Q3 Max

Distribution Shape and
Box-and-Whisker Plot

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Standard Deviation and Variance
• show the scatter of the individual measurements around the mean of
all the measurements in a given distribution.
• The variance represents squared units and, therefore, is not an
appropriate measure of dispersion when we wish to express this
concept in terms of the original units.
• To obtain a measure of dispersion in original units, we merely take the
square root of the variance. The result is called the standard
deviation.
• Variance the average of the squared difference from the mean
• Standard deviation is the square root of variance

4/25/2025
Variance and Standard Deviation
Population Sample

 i
 x   2

 ix  x 2

 s
N n 1

4/25/2025
SD  variance
To calculate standard deviation
1. Calculate the mean
x
2. Calculate the residual for each x xx

3. Square the residuals ( x  x )2

4. Calculate the sum of the squares


 xx 
2

5. Divide the sum in Step 4 by (n-1)   x  x 2

n 1
6. Take the square root of quantity
in Step 5
  x  x 2

n 1

4/25/2025
Example- Find Standard Deviation of Ungroup
Data

Family No. 1 2 3 4 5 6 7 8 9 10

Size (xi) 3 3 4 4 5 5 6 6 7 7

4/25/2025
Here, x
 x i

50
5
n 10

Family No. 1 2 3 4 5 6 7 8 9 10 Total

xi 3 3 4 4 5 5 6 6 7 7 50
xi  x -2 -2 -1 -1 0 0 1 1 2 2 0

x i  x  2
4 4 1 1 0 0 1 1 4 4 20


 ix  x  2

s  2.2  1.48
20
s2    2.2,
n 1 9

4/25/2025
Example
• The length of a newborn baby are: 600mm, 470mm, 170mm, 430mm
and 300mm.
• Find out the Mean, the Variance, and the Standard Deviation.
• Your first step is to find the Mean:
• Answer:
• Mean = 600 + 470 + 170 + 430 + 300 = 1970 = 394
5 5
• so the mean (average) height is 394 mm.

4/25/2025
To calculate the Variance, take each difference,
square it, and then average the result:

Standard Deviation

σ = √21,704
= 147.32...
= 147 (to the nearest
mm)
4/25/2025
Coefficient of Variation

• Measures relative variation

• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of data
measured in different units
S
CV     100%

X
4/25/2025
Thank you!

4/25/2025

Cambridge Computer Science For IGCSE Cambridge Course Book 2022 Pages 1
No ratings yet
Cambridge Computer Science For IGCSE Cambridge Course Book 2022 Pages 1
17 pages
CV - Andi Kurniawan - 2023
No ratings yet
CV - Andi Kurniawan - 2023
6 pages
Statistical Techniques in Business and Economics 12e Chapter 03
67% (3)
Statistical Techniques in Business and Economics 12e Chapter 03
45 pages
Module 3
No ratings yet
Module 3
186 pages
Chapter 3-Numerical Measures
No ratings yet
Chapter 3-Numerical Measures
38 pages
Measures of Central Tendency Measures of Dispersion Measures of Shape
No ratings yet
Measures of Central Tendency Measures of Dispersion Measures of Shape
36 pages
Aveva Everything3d 11 Foundations Rev 2 PDF
No ratings yet
Aveva Everything3d 11 Foundations Rev 2 PDF
145 pages
An Introduction To Network Analyzers New
No ratings yet
An Introduction To Network Analyzers New
18 pages
CEH Exam Blueprint v5
No ratings yet
CEH Exam Blueprint v5
5 pages
Measures of Averages
No ratings yet
Measures of Averages
32 pages
Unit 5 8614
No ratings yet
Unit 5 8614
39 pages
Twin-Turbine Centrifugal Compressor MODEL TT-300: Service Monitor User Manual
No ratings yet
Twin-Turbine Centrifugal Compressor MODEL TT-300: Service Monitor User Manual
68 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
16 pages
Biostatistics Unit 3 Measures of Statistics - Central Tendency
No ratings yet
Biostatistics Unit 3 Measures of Statistics - Central Tendency
57 pages
HyperX Cloud Flight S FW Update Instructions Rev 3102 4107
No ratings yet
HyperX Cloud Flight S FW Update Instructions Rev 3102 4107
3 pages
Statistics 2025
No ratings yet
Statistics 2025
46 pages
Where To Download Guest Additions Iso
No ratings yet
Where To Download Guest Additions Iso
3 pages
(4th Year) Roadmap To Dream Placement
No ratings yet
(4th Year) Roadmap To Dream Placement
1 page
Headstarter Residents Resume Template
No ratings yet
Headstarter Residents Resume Template
2 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Unit 4 & 5 8614
No ratings yet
Unit 4 & 5 8614
58 pages
Central Tendency
No ratings yet
Central Tendency
105 pages
Portion 9
No ratings yet
Portion 9
44 pages
Describing Data Numerical
No ratings yet
Describing Data Numerical
53 pages
Describing Data 2
No ratings yet
Describing Data 2
14 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
CHAPTER 3 Statistical Description of Data
No ratings yet
CHAPTER 3 Statistical Description of Data
30 pages
Oracle Applications - Query To Get Employee and Supervisor Hierarchy Details in Oracle Apps HRMS R12
No ratings yet
Oracle Applications - Query To Get Employee and Supervisor Hierarchy Details in Oracle Apps HRMS R12
3 pages
Lecture-3&4 - Measure of Centeral T
No ratings yet
Lecture-3&4 - Measure of Centeral T
171 pages
Standard 1
No ratings yet
Standard 1
3 pages
CH 3 Notes
No ratings yet
CH 3 Notes
15 pages
Central Tendency & Dispersion Guide
No ratings yet
Central Tendency & Dispersion Guide
44 pages
Slides For IT SKill
No ratings yet
Slides For IT SKill
63 pages
Lec - 4 (Summary Data)
No ratings yet
Lec - 4 (Summary Data)
89 pages
2.3 Descriptive Numerical Summary Measures
No ratings yet
2.3 Descriptive Numerical Summary Measures
67 pages
5.0 Summary Statistics
No ratings yet
5.0 Summary Statistics
47 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
60 pages
Lecure-2 Descriptive Biostatistics
No ratings yet
Lecure-2 Descriptive Biostatistics
102 pages
Describing Data - Numerical Measure
No ratings yet
Describing Data - Numerical Measure
33 pages
Statistics: Understanding Key Concepts
No ratings yet
Statistics: Understanding Key Concepts
30 pages
Chapter 03 PowerPoint
No ratings yet
Chapter 03 PowerPoint
45 pages
Measures of Central Tendency or Averages
No ratings yet
Measures of Central Tendency or Averages
9 pages
Ders 3-4 Descriptives of Statistics
No ratings yet
Ders 3-4 Descriptives of Statistics
31 pages
Attendance
No ratings yet
Attendance
2 pages
المحاضرة رقم 3
No ratings yet
المحاضرة رقم 3
44 pages
Topic 3 - Data Presentation, Summarization, Measure of Central Tendency&Spread.
No ratings yet
Topic 3 - Data Presentation, Summarization, Measure of Central Tendency&Spread.
48 pages
3.describing Data
No ratings yet
3.describing Data
35 pages
Ccna Cloud
No ratings yet
Ccna Cloud
294 pages
Session 1 ISM May 2024
No ratings yet
Session 1 ISM May 2024
59 pages
Final ETI Micro Project Report
0% (1)
Final ETI Micro Project Report
17 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Lecture 2-Summarizing Data - HSciences Biostats - 010232en
No ratings yet
Lecture 2-Summarizing Data - HSciences Biostats - 010232en
37 pages
Goals in Statistic
100% (1)
Goals in Statistic
149 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Data Description Analysis
No ratings yet
Data Description Analysis
40 pages
V Unit
No ratings yet
V Unit
27 pages
RSU - Statistics - Lecture 3 - Final - myRSU
No ratings yet
RSU - Statistics - Lecture 3 - Final - myRSU
34 pages
AK - STATISTIKA - 02 - Describing Data (Cont.)
No ratings yet
AK - STATISTIKA - 02 - Describing Data (Cont.)
47 pages
GE 104 Module 4
No ratings yet
GE 104 Module 4
24 pages
Oces DGFS-2025
No ratings yet
Oces DGFS-2025
3 pages
Bio Statistics 3
No ratings yet
Bio Statistics 3
13 pages
Business Statistics - Session Descriptive Statistics
No ratings yet
Business Statistics - Session Descriptive Statistics
28 pages
Descriptive Statistics Guide
No ratings yet
Descriptive Statistics Guide
78 pages
CH 3 Describing Data: Numerical Measures
No ratings yet
CH 3 Describing Data: Numerical Measures
45 pages
Writing With ChatGPT - Lingard 2023
No ratings yet
Writing With ChatGPT - Lingard 2023
10 pages
Key Concepts of Computer Studies
No ratings yet
Key Concepts of Computer Studies
253 pages
AUTOSAR Memory Stack
No ratings yet
AUTOSAR Memory Stack
31 pages
Orientation - Basic Mathematics and Statistics - CTD
No ratings yet
Orientation - Basic Mathematics and Statistics - CTD
35 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Abhilash Resume
No ratings yet
Abhilash Resume
5 pages
Digital Literacy
No ratings yet
Digital Literacy
19 pages
Measures of Location
No ratings yet
Measures of Location
33 pages
828D PLC FCT Man 0721 en-US
No ratings yet
828D PLC FCT Man 0721 en-US
356 pages
Statistics for Beginners
No ratings yet
Statistics for Beginners
42 pages
Immediate download Chaplains in Early Modern England Patronage Literature and Religion Politics Culture Society in Early Modern Britain Politics Culture and Society in Early Modern Britain Tom Lockwood Editor Gillian Wright Editor ebooks 2024
No ratings yet
Immediate download Chaplains in Early Modern England Patronage Literature and Religion Politics Culture Society in Early Modern Britain Politics Culture and Society in Early Modern Britain Tom Lockwood Editor Gillian Wright Editor ebooks 2024
14 pages
Nlp4web Lecture 2 Text Classification
No ratings yet
Nlp4web Lecture 2 Text Classification
109 pages
Basics For Understanding
No ratings yet
Basics For Understanding
8 pages
Electronics Engineer Internship Letter
No ratings yet
Electronics Engineer Internship Letter
2 pages
Organized (1) (AutoRecovered)
No ratings yet
Organized (1) (AutoRecovered)
37 pages
Week1-2 Chap 3 Descri Data
No ratings yet
Week1-2 Chap 3 Descri Data
44 pages
Mio-5377r DS (100223) 20231002134454
No ratings yet
Mio-5377r DS (100223) 20231002134454
2 pages
Statistics for Students
No ratings yet
Statistics for Students
11 pages
The Importance and Applications of Data Compression
No ratings yet
The Importance and Applications of Data Compression
4 pages
221 Chapter3 Student
No ratings yet
221 Chapter3 Student
16 pages
Measure of Central Tendency: Measure of Location: Goals
No ratings yet
Measure of Central Tendency: Measure of Location: Goals
7 pages
Instructions For Chapter 3 Prepared by Dr. Guru-Gharana: Terminology and Conventions
No ratings yet
Instructions For Chapter 3 Prepared by Dr. Guru-Gharana: Terminology and Conventions
11 pages

Summarizing Data

Uploaded by

Summarizing Data

Uploaded by

CHAPTER THREE

Describing Data Numerically

Central Tendency Variation Shape

Arithmetic Mean Range Skewness

Median Interquartile Range

Geometric Mean Standard Deviation

Quartiles Coefficient of Variation

• The tendency of statistical data to get concentrated at

•In order to give quantities being averaged there

Here, population is the weighting factor and the average

• Geometric Mean of 2 and 18 = √(2 × 18) = 6

• Geometric Mean = 3√(10 × 51.2 × 8) = 16

The upper distribution 0

has more dispersion

are more spread out 75

• How “spread out” the numbers are about the centre?

• The quartiles divide the distribution into four equal parts.

Minimum, Q1, Median, Q3, Maximum

Min Q1 Median Q3 Max

Left-Skewed Symmetric Right-Skewed

3. Square the residuals ( x  x )2

4. Calculate the sum of the squares

5. Divide the sum in Step 4 by (n-1)   x  x 2

Family No. 1 2 3 4 5 6 7 8 9 10 Total

• Measures relative variation

You might also like