Intro to Statistics for Researchers
Intro to Statistics for Researchers
Statistics is needed in research because it gives meaning to the data collected during
the process. It helps the researcher make a conclusion and answer the different
questions presented at the start of the research.
1
a. collection of data- the process of gathering information through experiments,
interviews, surveys, questionnaires etc.
b. presentation of data- organizing the collected data into tables, graphs and
text
c. analysis of data- process of getting important information from the data
collected using an appropriate statistical tool
d. Interpretation of data- process of getting answers from the analyzed data.
Conclusions about a large group can be formulated from the data gathered
from a small group.
Research deals with variables. Variables are attributes that can assume different
values. It can be classified according to:
Data mean the observations or measurements of variables. There are two kinds of
data that can be observed or measured:
2
height - 150 cm, 5 feet
Example:
This level of measurement can be ranked or ordered but the exact differences
between the differences does not exists.
This level of measurement can also be ranked or ordered and the exact differences
between the differences exists. However, there is no true zero.
Example: temperature (37°C is very much different from 38°C) but 0°C does not
necessarily mean no heat at all. Though in Physics, accordance to the Third Law of
Thermodynamics- absolute zero is unattainable in terms of heat and temperature.
IQ Test (an IQ test of 110 is different from 109) but you can’t say that an
unintelligent person has zero IQ.
This level of measurement can also be ranked or ordered and the exact differences
between the differences exists. In addition, there is a meaningful zero.
3
Example: salary weight velocity displacement
POPULATION VS SAMPLE
Population- the totality of subjects (individuals, objects, places, reactions and events)
with common characteristics that are being studies
,
MEASURE OF CENTRAL TENDENCY
The measures of central tendency, also known as the measures of averages include
the mean, median, mode and midrange.
a. THE MEAN
The mean is the sum of the values, divided by the total number of values. The symbol
represents the sample mean.
= X1 + X2 + X3 + …. + Xn = ∑X
n n
where n represents the total number of values in the sample. For a population, the
Greek letter μ (mu) is used for the mean.
μ = X1 + X2 + X3 + …. + XN = ∑X
N N
where N represents the total number of values in the population.
You can solve for the mean by simply adding the values of the data and dividing by
the total number of values.
The data represent the number of days off per year for a
sample of individuals selected from nine different countries.
SAMPLE PROBLEM 1 Find the mean.
SOLUTION:
= X1 + X2 + X3 + …. + Xn = ∑X
n n
= 20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30 = 276
9 9
= 30.7 days
SOLUTION:
A B C D
Class Frequency (f) Midpoint Xm f * Xm
5.5- 10.5 1
10.5- 15.5 2
15.5- 20.5 3
20.5- 25.5 5
5
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20
Xm = 8 Xm = 13
Xm = 18 Xm = 23
Xm = 28 Xm = 33
Xm = (35.5 + 40.5) ÷ 2 = 56 ÷ 2
Xm = 38
A B C D
Class Frequency (f) Midpoint Xm f * Xm
5.5- 10.5 1 8 8
10.5- 15.5 2 13 26
15.5- 20.5 3 18 54
20.5- 25.5 5 23 115
25.5- 30.5 4 28 112
30.5- 35.5 3 33 99
35.5- 40.5 2 38 76
n = 20 Σf * Xm = 490
b. Median
It is the midpoint of the data array. Data array mean that the data set is
ORDERED/ARRANGED. Its symbol is MD.
Here are the steps in getting the median of the data array:
STEP 1: Arrange the data in order.
STEP 2: Select the middle point
6
Find the median. Six customers purchased these
SAMPLE PROBLEM 3 numbers of hardbound books. 1, 7, 3, 3, 4, 2
STEP 1: Arrange the data in order
1, 2, 3, 3, 4, 7
The value that occurs the most in a set of data. Take note that a set of data can have
more than one mode or no mode at all.
Unimodal- a data set with one value that occurs with the greatest frequency
Bimodal- a data set with two values that occur with the same greatest frequency
Multimodal- a data set with more than two values with the same greatest frequency
Find the mode for the number of teachers per school for
10 selected secondary schools in Virac, Catanduanes
SAMPLE PROBLEM 5
25, 110, 234, 5, 78, 74, 15, 22, 45, 30
7
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20
Answer: 5
THE MIDRANGE
It is a very rough estimate of the average. It is the sum of the highest value and lowest
value in the data set. However, the midrange depends greatly on the extremely high
or low value in a data set. Its symbol is MR.
In getting the midrange, simple add the highest value and lowest value in the data
set and divide it by 2.
SOLUTION:
MR = (49 + 12)/ 2
MR = 30.5
The Median
1. The median is used to find the center or middle value of a data set.
8
2. The median is used when it is necessary to find out whether the data values fall into the
upper half or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely low values.
The Mode
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.
3. The mode can be used when the data are nominal, such as religious preference,
gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode, or the mode
may not exist for a data set.
The Midrange
1. The midrange is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in a data set.
20 14 14 17 17
18 12 14 18 8
15 15 15 10 11
10 18 12 15 9
2. Consider the frequency
distribution table below to compute for the mean and modal class. The data
represents the age of voters for a sample of 50.
MEASURE OF VARIATION
Another statistical test in descriptive research is the measure of variation. This measures
how spread out a set of data is. The data with less variability is a more consistent data.
It includes the range, variance and standard deviation.
THE RANGE
It is the difference between the highest value and the lowest value in the data set. Its
symbol is R.
9
R = highest value – lowest value
SOLUTION:
R = 49 – 12
R = 37
The variance is the average of the squares of the distance each value is from the
mean. Its symbol is σ2 (σ is a Greek lowercase letter sigma). The formula for population
variance is:
σ2 = ∑(X - µ)2
N
where: X is the individual value
µ is the population mean
N is the population size
The standard deviation is the square root of the variance. The symbol for the
population standard deviation is σ.
σ = √ σ2 = √(∑(X - µ)2)
N
A testing lab wishes to test two experimental brands of outdoor
paint to see how long each will last before fading. The testing
SAMPLE
lab makes 6 gallons of each paint to test. Since different
PROBLEM 9
chemical agents are added to each group and only six cans
are involved, these two groups constitute two small
populations. The results (in months) are shown.
Brand A Brand B
10 35
60 45
50 40
40 35
10
30 30
20 25
Compute for the variance and standard deviation for each brand.
SOLUTIONS:
BRAND A BRAND B
µ = 10 + 60 + 50 + 40 + 30 + 20 µ = 35 + 45 + 40 + 35 + 30 + 25
6 6
µ = 210/6 µ = 210/6
µ = 35 µ = 35
BRAND A BRAND B
10 – 35 = -25 35 – 35 = 0
60 – 35 = 25 45 – 35 = 10
50 – 35 = 15 40 – 35 = 5
40 – 35 = 5 35 – 35 = 0
30 – 35 = -5 30 – 35 = -5
20 – 35 = -15 25 – 35 = -10
BRAND A BRAND B
50 – 35 = 15 = 152 = 225 40 – 35 = 5 = 52 = 25
40 – 35 = 5 = 52 = 25 35 – 35 = 0 = 02 = 0
30 – 35 = -5 = (-5)2 = 25 30 – 35 = -5 = (-5)2 = 25
11
20 – 35 = -15 = (-15)2 = 225 25 – 35 = -10 = (-10)2 = 100
BRAND A BRAND B
= 1750 = 250
STEP 5: Get the population variance by dividing the sum of the squares to the
population size
BRAND A BRAND B
σ2 = 1750 ÷ 6 σ2 = 250 ÷ 6
σ2 = 291.7 σ2 = 41.7
STEP 5: Get the population standard deviation by getting the square root of the
variance
BRAND A BRAND B
σ = 17.1 σ = 6.5
Interpretation: It is interesting to note that the two brands have the same mean which
is 35 months. However, even if they have the same mean they don’t have the same
variance and standard deviation.
What does this mean? Brand A has higher standard deviation than Brand B. Therefore,
the data obtained for Brand B is more consistent and less varied than Brand A.
We can get the variance of a sample using the formula below. Its symbol is s2.
s2 = n(∑X2) – (∑X)2
n(n – 1)
where:
n is the sample size
∑X is the sum of all the values in the data set
∑X2 is the sum of the square of all the values in the data set
s2 is the sample variance
The sample standard deviation is the square root of the sample variance. Its symbol is
s.
12
Find the sample variance and standard deviation from
the scores of quarterly examination of ten students of
SAMPLE PROBLEM 10 grade 9 STE.
SOLUTIONS.
∑X = 45 + 40 + 32 + 12 + 41 + 18 + 20 + 37 + 30 + 49
∑X = 324
STEP 2: Square the values in data set, then get the sum
∑X2 = 452 + 402 + 322 + 122 + 412 + 182 + 202 + 372 + 302 + 492
∑X2 = 2025 + 1600 + 1024 + 144 + 1681 + 324 + 400 + 1369 + 900 + 2401
∑X2 = 11868
STEP 3: Substitute the values in the equation to compute for the sample variance
s2 = n(∑X2) – (∑X)2
n(n – 1)
s2 = 10 (11868) – (324)2
10 (10 – 1)
s2 = 118680 – 104976
10 (9)
s2 = 13704/90
s2 = 152.3
s = √s2
s = √152.3
s = 12.3
13
Take note that (∑X) 2 is different from ∑X2.
(∑X) 2 means that you need to add the values
first then square the sum while ∑X2 means that you
need to square the values then add them.
s = √s2
SOLUTION:
A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1
10.5- 15.5 2
15.5- 20.5 3
20.5- 25.5 5
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20
STEP 2: Get the midpoint by adding the lower class and upper class in the interval,
then divide it by two. Write you answer on column C.
A B C D E
14
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8
10.5- 15.5 2 13
15.5- 20.5 3 18
20.5- 25.5 5 23
25.5- 30.5 4 28
30.5- 35.5 3 33
35.5- 40.5 2 38
n = 20
STEP 3: Multiply the frequency (f) and midpoint (Xm). Write your answer on column D
and get the sum.
A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8 8
10.5- 15.5 2 13 26
15.5- 20.5 3 18 54
20.5- 25.5 5 23 115
25.5- 30.5 4 28 112
30.5- 35.5 3 33 99
35.5- 40.5 2 38 76
n = 20 ∑ f * Xm = 490
STEP 4: Square the midpoint and multiply it to the frequency. Write your answer on
column E.
A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8 8 64
10.5- 15.5 2 13 26 338
15.5- 20.5 3 18 54 972
20.5- 25.5 5 23 115 2645
25.5- 30.5 4 28 112 3136
30.5- 35.5 3 33 99 3267
35.5- 40.5 2 38 76 2888
n = 20 ∑( f * X2m)= 13310
15
20(19)
s2 = 26100/380
s = 68.7
2
2. They are used to test the consistency of a variable. The lesser the variance
and standard deviation, the more consistent the variable.
3. They are also used to determine if the values fall within the specified interval
in a distribution.
4. They are used in inferential statistics.
Adapted from “Elementary Statistics” by Allan G. Bluman p. 132
16
Discreet or Discontinuous- variable that assumes values that can be counted.
This cannot take in the form of decimals.
Qualitative Data- variables that can be categorized according to some
characteristics or attributes
Quantitative Data- these are numerical in nature and can be obtained
through measuring and counting
Measurement is used to convert qualitative data into quantitative data so
that they can be treated statistically.
Population- the totality of subjects (individuals, objects, places, reactions and
events) with common characteristics that are being studies
Sample- a group of subjects which is selected from a population
Parameter- characteristics or measurements obtained by using all the values in
a specific population
Statistic- characteristics or measurements obtained by using all the values from
a sample
Mean- the sum of the values, divided by the total number of values.
Median- it is the midpoint of the data array.
Data array- mean that the data set is ORDERED/ARRANGED.
Mode- the value that occurs the most in a set of data.
Midrange- It is a very rough estimate of the average. It is the sum of the highest
value and lowest value in the data set.
Range- It is the difference between the highest value and the lowest value in
the data set.
Variance- is the average of the squares of the distance each value is from the
mean
Standard deviation- is the square root of the variance
17