Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views17 pages

Intro to Statistics for Researchers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views17 pages

Intro to Statistics for Researchers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DESCRIPTIVE STATISTICS

Statistics is a branch of Mathematics which deals with


the collection, organization, analysis and interpretation
of quantitative data and such problems as experiment
design and decision-making. (Roc-Narag, 2010)

Statistics is the language of research. Statistics give


meaning to the data collected during the research proper. It
aids the researcher to make an accurate description of data
according to the computed mean, standard deviation and
relationship with another factor. But, how is statistics able to
help researcher like you in analyzing the data that you have
collected? This module will show you how to obtain and
interpret descriptive statistics such as the measures of
average and measures of variation. Have fun in learning all
of these by reading and answering the prepared activities in
this module.
At the end of the module, you should be able to:

• Differentiate between the following:


a. descriptive statistics and inferential statistics
b. population and sample
c. parameters and statistic
d. quantitative data and qualitative data
• Classify variables according to functional relationship
and continuity of values
• Classify data according to the levels of measurements
• Summarize data, using measures of central tendency, such as the mean,
median, mode, and midrange
• Describe data, using measures of variation, such as the range, variance, and
standard deviation.
• Explain the properties and uses of the different statistical tools used in
descriptive research

Mean, median, mode, midrange

A. Range, standard deviation, variance

Statistics is needed in research because it gives meaning to the data collected during
the process. It helps the researcher make a conclusion and answer the different
questions presented at the start of the research.

The four steps in statistics includes:

1
a. collection of data- the process of gathering information through experiments,
interviews, surveys, questionnaires etc.
b. presentation of data- organizing the collected data into tables, graphs and
text
c. analysis of data- process of getting important information from the data
collected using an appropriate statistical tool
d. Interpretation of data- process of getting answers from the analyzed data.
Conclusions about a large group can be formulated from the data gathered
from a small group.

Statistics can further be divided into two:

1. Descriptive Statistics- branch of statistics which deals with the collection,


summarization and presentation of data
2. Inferential Statistics- branch of statistics which deals with the process of
generalizing from samples to population, testing hypothesis, determining
relationships among variables and making prediction of future outcomes

Research deals with variables. Variables are attributes that can assume different
values. It can be classified according to:

1. According to Functional Relationships


a. Independent Variable- a variable that can be controlled or manipulated
b. Dependent Variable- a variable that cannot be controlled or manipulated

2. According to Continuity of Values


a. Continuous Variable- a variable obtained by measuring and can assume
all values. This can also take in the form of decimal.
Examples: height, mass, distance, voltage, electric current
b. Discrete or Discontinuous- variable that assumes values that can be
counted. This cannot take in the form of decimals.
Examples: number of samples, number of seeds, number of students

Data mean the observations or measurements of variables. There are two kinds of
data that can be observed or measured:

1. Qualitative Data- variables that can be categorized according to some


characteristics or attributes
Examples: Sex- male, female
Year Level- Grade 7, Grade 8, Grade 9, Grade 10 …
Employment Status- regular, probationary, part-time
2. Quantitative Data- these are numerical in nature and can be obtained
through measuring and counting
Examples: mass - 10 kg, 500 mg
temperature – 37.5°C, 273.15 K

2
height - 150 cm, 5 feet

Measurement is used to convert qualitative data into


quantitative data so that they can be treated
statistically.

Example:

Qualitative Data Quantitative Data


Small Measure the actual length,
Size of an Medium height and width of an object,
object Large say, the height of the plant is
12 cm.
Levels of Measurement

Variables can be classified according to how they are categorized, ordered or


counted.

✓ Nominal level of measurements

This level of measurement is not ordered or ranked.

Example: gender (male/female)

civil status (single, married, widowed, separated)

zip code (4800, 4801, 4802, 4803, 4804, 4805)

✓ Ordinal level of measurements

This level of measurement can be ranked or ordered but the exact differences
between the differences does not exists.

Example: size of a body (small, medium, large)

Type of quiz bee test (easy, average, difficult)

Rating scale (excellent, very satisfactory, satisfactory, needs improvement)

✓ Interval level of measurements

This level of measurement can also be ranked or ordered and the exact differences
between the differences exists. However, there is no true zero.

Example: temperature (37°C is very much different from 38°C) but 0°C does not
necessarily mean no heat at all. Though in Physics, accordance to the Third Law of
Thermodynamics- absolute zero is unattainable in terms of heat and temperature.

IQ Test (an IQ test of 110 is different from 109) but you can’t say that an
unintelligent person has zero IQ.

✓ Ratio level of measurements

This level of measurement can also be ranked or ordered and the exact differences
between the differences exists. In addition, there is a meaningful zero.

3
Example: salary weight velocity displacement

mass time force speed

You are now familiar with the different terms


that you will encounter as we go further in our study of
statistics. Remember, that the type of data gathered is
important in determining the type of statistical test
needed to answer questions in research. This time, let
us know the difference between a population and a
sample (both are where we usually get the information
needed in the conduct of research). Hop on!

POPULATION VS SAMPLE

Population- the totality of subjects (individuals, objects, places, reactions and events)
with common characteristics that are being studies

Sample - a group of subjects which is selected from a population

PARAMETERS VS STATISTIC (without /s)

Parameter- characteristics or measurements obtained by using all the values in a


specific population

Statistic- characteristics or measurements obtained by using all the values from a


sample

Researchers may draw samples from a population to


gain information. However, if the population is small, it
is not necessary to use samples since the population
can be used to get data in order to answer research
questions.

Let us discuss the first type of statistics- the descriptive


statistics. Researchers used descriptive statistics to
describe certain situation based from the data. Once
the data is collected through surveys, interviews,
experiments and some other means, it is summarized
and organized into other forms like tables, graphs and
charts so that we can have a clearer view on what
these data mean. Aside from tables, graphs and charts,
different statistical methods are also used to summarize
data. These includes the measure of averages (also
called measures of central tendency), measures of
variation and measures of position. We will only be
discussing the first two methods. Ready? Here goes…
4

,
MEASURE OF CENTRAL TENDENCY

The measures of central tendency, also known as the measures of averages include
the mean, median, mode and midrange.

a. THE MEAN

The mean is the sum of the values, divided by the total number of values. The symbol
represents the sample mean.

= X1 + X2 + X3 + …. + Xn = ∑X
n n
where n represents the total number of values in the sample. For a population, the
Greek letter μ (mu) is used for the mean.

μ = X1 + X2 + X3 + …. + XN = ∑X
N N
where N represents the total number of values in the population.

You can solve for the mean by simply adding the values of the data and dividing by
the total number of values.

The data represent the number of days off per year for a
sample of individuals selected from nine different countries.
SAMPLE PROBLEM 1 Find the mean.

20, 26, 40, 36, 23, 42, 35, 24, 30

SOLUTION:

= X1 + X2 + X3 + …. + Xn = ∑X
n n
= 20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30 = 276
9 9
= 30.7 days

Consider the frequency distribution table below to compute


for the mean. The data represent the number of miles run
SAMPLE PROBLEM 2 during one week for a sample of 20 runners.

SOLUTION:

STEP 1: Make a table as shown.

A B C D
Class Frequency (f) Midpoint Xm f * Xm
5.5- 10.5 1
10.5- 15.5 2
15.5- 20.5 3
20.5- 25.5 5

5
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20

STEP 2: Find the midpoint Xm (Column C)

Xm = (5.5 + 10.5) ÷ 2 = 16 ÷ 2 Xm = (10.5 + 15.5) ÷ 2 = 26 ÷ 2

Xm = 8 Xm = 13

Xm = (15.5 + 20.5) ÷ 2 = 36 ÷ 2 Xm = (20.5 + 25.5) ÷ 2 = 46 ÷ 2

Xm = 18 Xm = 23

Xm = (25.5 + 30.5) ÷ 2 = 56 ÷ 2 Xm = (30.5 + 35.5) ÷ 2 = 66 ÷ 2

Xm = 28 Xm = 33

Xm = (35.5 + 40.5) ÷ 2 = 56 ÷ 2

Xm = 38

STEP 3: Multiply the midpoint and the frequency (Column D)

STEP 4: Add the values in column D

STEP 5: Divide the sum by n to get the mean

A B C D
Class Frequency (f) Midpoint Xm f * Xm
5.5- 10.5 1 8 8
10.5- 15.5 2 13 26
15.5- 20.5 3 18 54
20.5- 25.5 5 23 115
25.5- 30.5 4 28 112
30.5- 35.5 3 33 99
35.5- 40.5 2 38 76
n = 20 Σf * Xm = 490

= 490 ÷ 20 = 24.5 miles

b. Median
It is the midpoint of the data array. Data array mean that the data set is
ORDERED/ARRANGED. Its symbol is MD.
Here are the steps in getting the median of the data array:
STEP 1: Arrange the data in order.
STEP 2: Select the middle point

6
Find the median. Six customers purchased these
SAMPLE PROBLEM 3 numbers of hardbound books. 1, 7, 3, 3, 4, 2
STEP 1: Arrange the data in order
1, 2, 3, 3, 4, 7

STEP 2: Select the middle point


MD = (3 + 3) ÷ 2 = 3
MD = 6 ÷ 2 = 3

Find the median. The number of typhoons in the


Philippines for the past seven years. 20, 18, 25, 23, 17, 19,
SAMPLE PROBLEM 4 16

STEP 1: Arrange the data in order


16, 17, 18, 19, 20, 23, 25

STEP 2: Select the middle point


MD = 19
MODE

The value that occurs the most in a set of data. Take note that a set of data can have
more than one mode or no mode at all.

Unimodal- a data set with one value that occurs with the greatest frequency

Bimodal- a data set with two values that occur with the same greatest frequency

Multimodal- a data set with more than two values with the same greatest frequency

Find the mode for the number of teachers per school for
10 selected secondary schools in Virac, Catanduanes
SAMPLE PROBLEM 5
25, 110, 234, 5, 78, 74, 15, 22, 45, 30

Answer: The data set has no mode. It is wrong to answer


0 (zero) because there are quantities that zero could
mean a certain value; example: temperature - 0°C

Find the modal class (term used if the mode is from a


grouped data).
SAMPLE PROBLEM 6
A B
Class Frequency (f)
5.5- 10.5 1
10.5- 15.5 2
15.5- 20.5 3
20.5- 25.5 5

7
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20
Answer: 5

THE MIDRANGE

It is a very rough estimate of the average. It is the sum of the highest value and lowest
value in the data set. However, the midrange depends greatly on the extremely high
or low value in a data set. Its symbol is MR.

In getting the midrange, simple add the highest value and lowest value in the data
set and divide it by 2.

MR = highest value + lowest value


2

Find the midrange data from the scores of quarterly


examination of ten students of grade 9 STE.
SAMPLE PROBLEM 7
45, 40, 32, 12, 41, 18, 20, 37, 30, 49

SOLUTION:

MR = (49 + 12)/ 2

MR = 30.5

So far, how do you find the measure of central


tendency? Isn’t it easy? Don’t worry if some of it may
seem confusing, you’ll understand it more as you apply
it on your actual research. Read the table below as
adapted from Allan G. Bluman’s book of Elementary
Statistics. I hope that it can help you further understand
the measure of central tendency. Keep going!
,

Uses and Properties of Central Tendency


The Mean
1. The mean is found by using all the values of the data.
2. The mean varies less than the median or mode when samples are taken from the same
population and all three measures are computed for these samples.
3. The mean is used in computing other statistics, such as the variance.
4. The mean for the data set is unique and not necessarily one of the data values.
5. The mean cannot be computed for the data in a frequency distribution that has an
open-ended class.

The Median
1. The median is used to find the center or middle value of a data set.

8
2. The median is used when it is necessary to find out whether the data values fall into the
upper half or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely low values.

The Mode
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.
3. The mode can be used when the data are nominal, such as religious preference,
gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode, or the mode
may not exist for a data set.

The Midrange
1. The midrange is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in a data set.

Adapted from “Elementary Statistics” by Allan G. Bluman p. 116


Find the mean, median, mode and midrange.
Write your solution on a separate sheet of paper.

1. Scores of 20 Grade 9-STE students to the 20-


item first summative test in Science are listed below.

20 14 14 17 17
18 12 14 18 8
15 15 15 10 11
10 18 12 15 9
2. Consider the frequency
distribution table below to compute for the mean and modal class. The data
represents the age of voters for a sample of 50.

Age of Voters Frequency f


18-23 8
24-29 10
30-35 15
36-41 2
42-47 12
48-53 3
n = 50

MEASURE OF VARIATION

Another statistical test in descriptive research is the measure of variation. This measures
how spread out a set of data is. The data with less variability is a more consistent data.
It includes the range, variance and standard deviation.

THE RANGE

It is the difference between the highest value and the lowest value in the data set. Its
symbol is R.

9
R = highest value – lowest value

The range is also largely affected by extremely high or low data.

Find the range from the scores of quarterly examination


of ten students of grade 9 STE.
SAMPLE PROBLEM 8
45, 40, 32, 12, 41, 18, 20, 37, 30, 49

SOLUTION:

R = highest value – lowest value

R = 49 – 12

R = 37

POPULATION VARIANCE AND STANDARD DEVIATION

Another meaningful method of measuring the variability of a given data is through


variance and standard deviation. We will start with the population variance and
standard deviation.

The variance is the average of the squares of the distance each value is from the
mean. Its symbol is σ2 (σ is a Greek lowercase letter sigma). The formula for population
variance is:

σ2 = ∑(X - µ)2
N
where: X is the individual value
µ is the population mean
N is the population size

The standard deviation is the square root of the variance. The symbol for the
population standard deviation is σ.

σ = √ σ2 = √(∑(X - µ)2)
N
A testing lab wishes to test two experimental brands of outdoor
paint to see how long each will last before fading. The testing
SAMPLE
lab makes 6 gallons of each paint to test. Since different
PROBLEM 9
chemical agents are added to each group and only six cans
are involved, these two groups constitute two small
populations. The results (in months) are shown.

Brand A Brand B
10 35
60 45
50 40
40 35

10
30 30
20 25

Compute for the variance and standard deviation for each brand.

SOLUTIONS:

STEP 1: Get the population mean (µ) for each brand.

BRAND A BRAND B

µ = 10 + 60 + 50 + 40 + 30 + 20 µ = 35 + 45 + 40 + 35 + 30 + 25
6 6
µ = 210/6 µ = 210/6
µ = 35 µ = 35

STEP 2: Subtract the mean from each data value

BRAND A BRAND B

10 – 35 = -25 35 – 35 = 0

60 – 35 = 25 45 – 35 = 10

50 – 35 = 15 40 – 35 = 5

40 – 35 = 5 35 – 35 = 0

30 – 35 = -5 30 – 35 = -5

20 – 35 = -15 25 – 35 = -10

STEP 3: Square the difference

BRAND A BRAND B

10 – 35 = -25 = (-25)2 = 625 35 – 35 = 0 = 02 = 0

60 – 35 = 25 = 252 = 625 45 – 35 = 10 = 102 = 100

50 – 35 = 15 = 152 = 225 40 – 35 = 5 = 52 = 25

40 – 35 = 5 = 52 = 25 35 – 35 = 0 = 02 = 0

30 – 35 = -5 = (-5)2 = 25 30 – 35 = -5 = (-5)2 = 25

11
20 – 35 = -15 = (-15)2 = 225 25 – 35 = -10 = (-10)2 = 100

STEP 4: Get the sum of the squares

BRAND A BRAND B

625 + 625 + 225 + 25 + 25 + 225 0 + 100 + 25 + 0 + 25 + 100

= 1750 = 250

STEP 5: Get the population variance by dividing the sum of the squares to the
population size

BRAND A BRAND B

σ2 = 1750 ÷ 6 σ2 = 250 ÷ 6

σ2 = 291.7 σ2 = 41.7

STEP 5: Get the population standard deviation by getting the square root of the
variance

BRAND A BRAND B

√σ2 = √291.7 √σ2 = √41.7

σ = 17.1 σ = 6.5

Interpretation: It is interesting to note that the two brands have the same mean which
is 35 months. However, even if they have the same mean they don’t have the same
variance and standard deviation.

What does this mean? Brand A has higher standard deviation than Brand B. Therefore,
the data obtained for Brand B is more consistent and less varied than Brand A.

SAMPLE VARIANCE AND STANDARD DEVIATION

We can get the variance of a sample using the formula below. Its symbol is s2.

s2 = n(∑X2) – (∑X)2
n(n – 1)
where:
n is the sample size
∑X is the sum of all the values in the data set
∑X2 is the sum of the square of all the values in the data set
s2 is the sample variance

The sample standard deviation is the square root of the sample variance. Its symbol is
s.

s = √s2 = √(∑X2) – (∑X)2


n(n – 1)

12
Find the sample variance and standard deviation from
the scores of quarterly examination of ten students of
SAMPLE PROBLEM 10 grade 9 STE.

45, 40, 32, 12, 41, 18, 20, 37, 30, 49

SOLUTIONS.

STEP 1: Add the values in the set of data

∑X = 45 + 40 + 32 + 12 + 41 + 18 + 20 + 37 + 30 + 49

∑X = 324

STEP 2: Square the values in data set, then get the sum

∑X2 = 452 + 402 + 322 + 122 + 412 + 182 + 202 + 372 + 302 + 492

∑X2 = 2025 + 1600 + 1024 + 144 + 1681 + 324 + 400 + 1369 + 900 + 2401

∑X2 = 11868

STEP 3: Substitute the values in the equation to compute for the sample variance

s2 = n(∑X2) – (∑X)2
n(n – 1)

s2 = 10 (11868) – (324)2
10 (10 – 1)

s2 = 118680 – 104976
10 (9)

s2 = 13704/90

s2 = 152.3

STEP 4: Compute for the sample standard deviation

s = √s2

s = √152.3

s = 12.3

13
Take note that (∑X) 2 is different from ∑X2.
(∑X) 2 means that you need to add the values
first then square the sum while ∑X2 means that you
need to square the values then add them.

VARIANCE AND STANDARD DEVIATION FOR GROUPED DATA

s2 = n(∑( f * X2m)) - ∑(f * Xm)2


n(n-1)

Standard deviation is the square root of this variance.

s = √s2

Consider the frequency distribution table below to


compute for the variance and standard deviation. The
SAMPLE PROBLEM 11 data represent the number of miles run during one week
for a sample of 20 runners.

SOLUTION:

STEP 1: Make a table as shown.

A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1
10.5- 15.5 2
15.5- 20.5 3
20.5- 25.5 5
25.5- 30.5 4
30.5- 35.5 3
35.5- 40.5 2
n = 20

STEP 2: Get the midpoint by adding the lower class and upper class in the interval,
then divide it by two. Write you answer on column C.

5.5 + 10.5 = 16/2 = 8


2
Do this on other intervals.

A B C D E

14
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8
10.5- 15.5 2 13
15.5- 20.5 3 18
20.5- 25.5 5 23
25.5- 30.5 4 28
30.5- 35.5 3 33
35.5- 40.5 2 38
n = 20

STEP 3: Multiply the frequency (f) and midpoint (Xm). Write your answer on column D
and get the sum.

A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8 8
10.5- 15.5 2 13 26
15.5- 20.5 3 18 54
20.5- 25.5 5 23 115
25.5- 30.5 4 28 112
30.5- 35.5 3 33 99
35.5- 40.5 2 38 76
n = 20 ∑ f * Xm = 490

STEP 4: Square the midpoint and multiply it to the frequency. Write your answer on
column E.

1 * 82 = 64 2 * 132 = 338 …. Do the same with the succeeding


intervals… get the sum.

A B C D E
Class Frequency (f) Midpoint Xm f * Xm f * X2m
5.5- 10.5 1 8 8 64
10.5- 15.5 2 13 26 338
15.5- 20.5 3 18 54 972
20.5- 25.5 5 23 115 2645
25.5- 30.5 4 28 112 3136
30.5- 35.5 3 33 99 3267
35.5- 40.5 2 38 76 2888
n = 20 ∑( f * X2m)= 13310

STEP 5: Substitute the values to the equation of variance

s2 = n(∑( f * X2m)) - ∑(f * Xm)2


n(n-1)
s2 = 20(13310) – (490)2
20(20 - 1)
s2 = 266200 – 240100

15
20(19)
s2 = 26100/380
s = 68.7
2

STEP 6: Get the square root for the standard deviation


s = √s2
s = √68.7
s = 8.3

USES OF VARIANCE AND STANDARD DEVIATION


1. They are used to determine the spread of the data. It is used to determine
which data are more variable. The greater the variance and standard
deviation, the more variable the data are.

2. They are used to test the consistency of a variable. The lesser the variance
and standard deviation, the more consistent the variable.
3. They are also used to determine if the values fall within the specified interval
in a distribution.
4. They are used in inferential statistics.
Adapted from “Elementary Statistics” by Allan G. Bluman p. 132

You maybe are doing your own scientific


investigation which involves collecting, summarizing
and analyzing data.

Get the variance and standard deviation of the data


that you have gathered. Those are the two measures
that you will be using in hypothesis testing and/or getting the relationship
between your variables which shall be discussed in the next module.

The following terms used in this module are defined as


follows:

Descriptive Statistics- branch of statistics which deals with


the collection, summarization and presentation of data
Inferential Statistics- branch of statistics which deals with
the process of generalizing from samples to population, testing hypothesis,
determining relationships among variables and making prediction of future
outcomes
Independent Variable- a variable that can be controlled or manipulated
Dependent Variable- a variable that cannot be controlled or manipulated
Continuous Variable- a variable obtained by measuring and can assume all
values. This can also take in the form of decimal.

16
Discreet or Discontinuous- variable that assumes values that can be counted.
This cannot take in the form of decimals.
Qualitative Data- variables that can be categorized according to some
characteristics or attributes
Quantitative Data- these are numerical in nature and can be obtained
through measuring and counting
Measurement is used to convert qualitative data into quantitative data so
that they can be treated statistically.
Population- the totality of subjects (individuals, objects, places, reactions and
events) with common characteristics that are being studies
Sample- a group of subjects which is selected from a population
Parameter- characteristics or measurements obtained by using all the values in
a specific population
Statistic- characteristics or measurements obtained by using all the values from
a sample
Mean- the sum of the values, divided by the total number of values.
Median- it is the midpoint of the data array.
Data array- mean that the data set is ORDERED/ARRANGED.
Mode- the value that occurs the most in a set of data.
Midrange- It is a very rough estimate of the average. It is the sum of the highest
value and lowest value in the data set.
Range- It is the difference between the highest value and the lowest value in
the data set.
Variance- is the average of the squares of the distance each value is from the
mean
Standard deviation- is the square root of the variance

17

You might also like