Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views11 pages

Chap6 STAT 2

Chapter 6 introduces basic concepts of statistics, including descriptive and inferential statistics, and discusses techniques for summarizing data. It covers graphical representations such as pie charts and bar charts, as well as numerical summary measures like mean, median, mode, and standard deviation. The chapter emphasizes the importance of understanding populations and samples in statistical analysis.

Uploaded by

Bảo Châu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

Chap6 STAT 2

Chapter 6 introduces basic concepts of statistics, including descriptive and inferential statistics, and discusses techniques for summarizing data. It covers graphical representations such as pie charts and bar charts, as well as numerical summary measures like mean, median, mode, and standard deviation. The chapter emphasizes the importance of understanding populations and samples in statistical analysis.

Uploaded by

Bảo Châu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 6: Introduction to Statistics

Nguyen Minh Tri

University of Information Technology

April 10, 2025

Basic concepts of statistics

Graphical representation of data

Numerical summary measures

6.1 Basic concepts of statistics


Definition 6.1 Statistics is the science of conducting studies to collect, organize, summa-
rize, analyze, and draw conclusions from data.
Definition 6.2
1. Descriptive statistics are procedures used to summarize and describe the important
characteristics of a set of measurements.
2. Inferential statistics: Techniques and methods used to analyze a small, specific set
of data so as to draw a conclusion about a large, more general collection of data.
In this chapter, we will present some basic techniques in descriptive statistics—the branch of
statistics concerned with describing sets of measurements, both and .
Example 6.3 In January 2024, https://statisticsanddata.org/ published the number of
users of social networking platforms up to January 2024.

The illustration above is a form of descriptive statistics.


Example 6.4 In May 2023, Statistical released data on the most popular programming
languages worldwide.

Example 6.5 It is not practical to measure the height of every student in your university.
Instead, you can take a sample of students and use inferential statistics to estimate the
average height of the entire school population.
Example 6.6 Determine whether descriptive or inferential statistics were used
1. The average price of a 30-second ad for the Academy Awards show in a recent year
was 1.90 million dollars.
2. The Department of Economic and Social A↵airs predicts that the population of
Mexico City, Mexico, in 2030 will be 238,647,000 people.
3. A medical report stated that taking statins is proven to lower heart attacks, but
some people are at a slightly higher risk of developing diabetes when taking statins.

4. A survey of 2234 people conducted by the Harris Poll found that 55% of the
respondents said that excessive complaining by adults was the most annoying social
media habit.
Descriptive Statistics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Inferential Statistics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
We are summarizing data or making an inference, every statistics problem involves a pop-
ulation and a sample.
Definition 6.7
1. A population is the set of all measurements of interest to the investigator.
2. A sample is a subset of measurements selected from the population of interest.
3. A variable is a characteristic that changes or varies over time and/or for di↵erent
individuals or objects under consideration.
Example 6.8 A set of five students is selected from all undergraduates at a large university,
and measurements are entered into a spreadsheet as shown in the following table
Student Year GPA Gender Major
1 1 7.8 Male Mathematics
2 3 6.9 Male Physics
3 2 9.1 Female English
4 4 8.4 Female English
5 2 8.1 Male Business
Identify the various elements involved in obtaining this set of measurements.
Four variables are measured for each student: grade point average (GPA), gender,
year in college, major.
If we consider the GPAs of all students at this university to be the population of
interest, the five GPAs represent a sample from this population. If the GPA of each
undergraduate student at the university had been measured, we would have the
entire population of measurements for this variable.
Definition 6.9
1. Qualitative variables (or categorical variables) measure a quality or characteristic on
each unit.
2. Quantitative variables measure a numerical quantity or amount on each unit.
Example 6.10
Qualitative variables: The variables gender, year, and major.
GPA, The number of consumers who refuse to answer a telephone survey.

6.2 Graphical representation of data


A statistical table is a list of the categories along with a measure of how often each
value occurred.
The frequency, or number of measurements in each category
The relative frequency, or proportion of measurements in each category.
Once the measurements have been summarized in a statistical table, you can use
either a pie chart or a bar chart to display the distribution of the data.
A pie chart is the familiar circular graph that shows how the measurements are
distributed among the categories.
A bar chart shows the same distribution of measurements among the categories,
with the height of the bar measuring how often a particular category was observed.
Example 6.11 In a public education survey, 400 school administrators were asked to rate
the quality of education in the United States. Their responses are summarized in the
following table.
Rating Frequency
A 35
B 260
C 93
D 12
Total 400
Construct a pie chart and a bar chart for this set of data.
Calculations for the Pie Chart
Rating Frequency Ralative frequency Percent
A 35 35/400=0.09 9%
B 260 260/400=0.65 65%
C 93 93/400=0.23 23%
D 12 12/400=0.03 3%
Total 400 1 100%

: the frequency of the class


: sum of all frequencies
: relative frequency for the class

Line chart: When a quantitative variable is recorded over time at equally spaced intervals
(such as daily, weekly, monthly, quarterly, or yearly), the data set forms a time series. Time
series data are most e↵ectively presented on a line chart with time as the horizontal axis.
Example 6.12 The United States Bureau of the Census gives projections for the portion
of the U.S. population that will be 85 and over in the coming years.
Year 2020 2030 2040 2050 2060
85 and over (millions) 6.7 9.1 14.6 19.0 19.7

Stem-and-leaf plot: each observation in the data set must have at least two digits. Each
observation as consisting of two pieces (a stem and a leaf). The number 372 could be
split into the pieces 37 (the first two digits) and 2 (the last digit).
Construct a Stem and leaf plot:
1. Split each observation into a
Stem: one or more of the leading, or left-hand, digits; and a
Leaf: the trailing, or remaining, digit(s) to the right.
2. Write a sequence of stems in a column, from the smallest occurring stem to the
largest. Include all stems between the smallest and largest, even if there are no
corresponding leaves.
3. List all the digits of each leaf next to its corresponding stem. It is not necessary to
put the leaves in increasing order, but make sure the leaves line up vertically.
4. Indicate the units for the stems and leaves.
Example 6.13 Table lists the prices (in dollars) of 19 di↵erent brands of walking shoes.
90 70 70 70 75 70 65
68 60 74 70 95 75 70
68 65 40 65 70
Use a stem and leaf plot to display the data.
4 0
5
6 5 8 0 8 5 5
7 0 0 0 5 0 4 0 5 0 0
8
9 0 5
Few stems and a large number of leaves within each stem: Stems are usually divided
in one of two ways:

Into two lines, with leaves 0-4 in the first line and leaves 5-9 in the second line
Into five lines, with leaves 0-1, 2-3, 4-5, 6-7, and 8-9 in the five lines, respectively
Example 6.14 The weights at birth of 30 full-term babies recorded to the nearest tenth of
a pound
7.2 7.8 6.8 6.2 8.2 8.0 8.2 5.6 8.6 7.1
8.2 7.7 7.5 7.2 7.7 5.8 6.8 6.8 8.5 7.5
6.1 7.9 9.4 9.0 7.8 8.5 9.0 7.7 6.7 7.7
Construct a stem and leaf plot to display the distribution of the data.
5 6 8
6 1 2
6 8 8 8 7
7 2 2 1
7 8 7 9 5 7 7 5 8 7
8 0 2 2 2
8 5 6 5
9 0 4 0
Relative Frequency histograms: some definitions and terms are commonly used when
constructing relative frequency histograms
Definition 6.15
1. A class is a subinterval created when you divide up the interval from the smallest to
the largest measurement.
2. The class boundaries are the numbers that create the upper and lower limits of the
class.
3. The class width is the di↵erence between the upper and lower class boundaries.
4. The class frequency is the number of measurements falling into that particular class.
5. A relative frequency histogram for a quantitative data set is a bar graph in which the
height of the bar shows “how often” (measured as a proportion or relative
frequency) measurements fall into each subinterval or class. The classes or
subintervals are plotted along the horizontal axis.
Example 6.16 The weights at birth of 30 full-term babies recorded to the nearest tenth of
a pound
7.2 7.8 6.8 6.2 8.2 8.0 8.2 5.6 8.6 7.1
8.2 7.7 7.5 7.2 7.7 5.8 6.8 6.8 8.5 7.5
6.1 7.9 9.4 9.0 7.8 8.5 9.0 7.7 6.7 7.7
Construct relative frequency histogram to display the distribution of the data.
Class Class boundaries Class Frequency Class relative Frequency
1 [5.6;6.1) 2 2/30
2 [6.1;6.6) 2 2/30
3 [6.6;7.1) 4 4/30
4 [7.1;7.6) 5 5/30
5 [7.6;8.1) 8 8/30
6 [8.1;8.6) 5 5/30
7 [8.6;9.1) 3 3/10
8 [9.1;9.6) 1 1/30
6.3 Numerical summary measures
Definition 6.17 Numerical measures associated with a population of measurements are
called parameters; those calculated from sample measurements are called statistics.
Definition 6.18 The arithmetic mean or average of a set of measurements is equal to
the sum of the measurements divided by
Sample mean: ( -bar)
Population mean:
Example 6.19 The temperature (in degrees Fahrenheit) at a base camp for 12 randomly
selected days is given in the following table

6 11 20 19 23 28 30 8 23 25 29 33

Find the sample mean temperature at the base camp.

1 1
= = (6 + 11 + 20 + 19 + 23 + 28 + 30 + 8 + 23 + 25 + 29 + 33) = 21 25
12

Definition 6.20 The sample median, denoted , of the observations is


the middle number when the observations are arranged in order from smallest to largest.
1. If is odd, then the sample median is the single middle value.
2. If is even, then the sample median is the mean of the two middle values.
Example 6.21 The following three examples show how to find the median under various
circumstances.

Observations Median
10 11 14 16 17 =
10 11 14 16 57 =
10 11 14 16 17 20 =
Definition 6.22 The mode, denoted of the observations is the value
that occurs most often, or with the greatest frequency.
If all the observations occur with the same frequency, then the mode does not exist.
If two or more observations occur with the same greatest frequency, then the mode
is not unique.
When measurements on a continuous variable have been grouped as a frequency or
relative frequency histogram, the class with the highest frequency is called the
modal class, and the midpoint of that class is taken to be the mode.
Example 6.23
a. Mode for Starbucks data is 5.
b. For birth weight data: Using data in the table: mode is ........ (occurs 4 times). Using
histogram: the class with the highest peak is the weights between 7.6 and 8.1. The mode
is .......
Definition 6.24 For observations on a categorical variable with only two responses, the
sample proportion of successes, denoted ˆ is the relative frequency of occurrence of
successes:
number of successes in the sample
ˆ=
total number of responses

The population proportion of successes is denoted by


Example 6.25 The temperature (in degrees Fahrenheit) at a base camp for 12 randomly
selected days is given in the following table

6 11 20 19 23 28 30 8 23 25 29 33
Find the sample proportion of days that has the temperature more than 20.
Solution. The sample proportion is
....................................................................................
....................................................................................
Definition 6.26 The range, of a set of measurements is defined as the di↵erence
between the largest and smallest measurements.
Example 6.27

a. Range of Starbucks data; =


b. Range of Birth weight data; =
Definition 6.28
1. The variance of a population of measurements
( )
=
where is population mean.
2. The variance of a sample of measurements
( )
=
1
where is sample mean.
Example 6.29 Comparing co↵ee prices at 4 randomly selected grocery stores in Thu Duc
showed increases compared to last month of 12, 15, 17 and 20 thousand VND for per 1
kg bag. Find the mean and variance of this sample.
Sample mean
................................................................................
Sample variance
................................................................................
................................................................................
Definition 6.30 The standard deviation of a set of measurements is equal to the positive
square root of the variance.
sample standard deviation:
population standard deviation:

Measurements are divided into classes.


classes
: midpoints of classes
: the successive class frequencies
Mean and standard deviation of that sample:
1
= and = ( )
1
Example 6.31 The height of 40 students
Class Frequency
(155; 160] 8 Sample mean
(160; 165] 10 =
(165; 170] 15 Sample variance =
(170; 175] 5
(175; 180] 2
Definition 6.32 Let be a set of observations. The quartiles divide the data
into four parts
1. The first (lower) quartile, denoted ( ) is the median of the lower half of the
observations when they are arranged in ascending order.
2. The second quartile is the median ˜ =
3. The third (upper) quartile, denoted ( ) is the median of the upper half of the
observations when they are arranged in ascending order.
4. The interquartile range, denoted IQR is the di↵erence IQR =
Example 6.33 A sample of 20 people yielded the weekly television viewing times, in hours.
25 41 27 32 43 66 35 31 15 5 34 26 32 38 16 30 38 30 20 21
Determine the quartiles for these data.
Solution.
1. Arrange the data in increasing order.
5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66
2. Median is the second quartile,
................................................................................
3. Divide the ordered data set into two halves
5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66
4. The median of the bottom half of the data set is the first quartile,
................................................................................

5. The median of the top half of the data set is the third quartile,
................................................................................
The summary information contained in the quartiles is highlighted in a boxplot.
Definition 6.34 A boxplot is a graph of a data set obtained by drawing a horizontal line
from the minimum data value to drawing a horizontal line from to the maximum
data value, and drawing a box whose vertical sides pass through and with a vertical
line inside the box passing through the median or
Example 6.35 The number of meteorites found in 10 states of the United States is 89, 47,
164, 296, 30, 215, 138, 78, 48, 39. Construct a boxplot for the data.
Solution.
1. Arrange the data in order: 30, 39, 47, 48, 78, 89, 138, 164, 215, 296
2. Median = =
3. Find = and =
4. min = and max =
5. Draw boxplot
Example 6.36 During the laying of gas pipelines, the depth of the pipeline (in mm) must
be controlled. One service provider recorded depths of
418 428 431 420 412 425 423 433 417 420 410 431 429 425
Find the sample mean, sample standard deviation, median.

Example 6.37 A sample of 15 participants reported that they now smoke the following
number of cigarettes per day.
10 9 10 8 7 6 10 9 10 8 9 10 8 8 10
a. Determine the quartiles for these data.
b. Draw boxplot

You might also like