Data Gathering Part 1
Data Gathering Part 1
PART 1
1. To understand the processes followed by the researcher and the statistical
tools used by the statistician in the story, study Lessons 1 and2.
In your research paper, this is the part where you give a narrative
description on the steps undertaken in data gathering. The discussion should
include the specifics of the procedures of the entire research process. The tense
of verbs to be used in research proposal is future tense. After the collection of
data, researchers change the tense of verbs to past tense.
In collecting data, there are ethics to consider. Here are the ethical
consideration:
Protect the data you collected from your respondents. Do not leave
anything with personal information in a place that can be easily accessed by
people who do not need to see the data. If possible, keep the information in a
secure or locked location.
4. Risks
2
You have to avoid or minimize anything that will cause physical or
emotional harm to your respondents. Make your respondents aware of any
potential harms prior to their involvement in your study.
After analyzing the data gathered, it is always good to share the result to
your respondents.
When you are done with your research paper, you are encouraged to
publish your work. You did not conduct a study to keep the results by yourself
but for people to learn. Your work might be useful for future researches.
In collecting data, there are process to follow. The common process for a
quantitative research is indicated below:
1. Seek approval from the head of the institution where you are going to
gather data. For schools, ask permission from the school head. For
any businesses, ask permission from the manager through a letter.
2. After the approval of the head of the institution, recruitment of
respondents follows. Your target respondents should be aware that they
are your respondents. It should be explained the risk of involvement in
the study, their right to withdraw, and know the results of your study.
3. Once your target respondents agreed, you may gather data using the
appropriate method in gathering data.
4. Analyze the data using the appropriate statistical tools.
5. Disseminate the results.
The researchers sought the approval of the school president through a letter.
Upon his approval, the approved letter was presented to the instructors for the
scheduling of the administration of the questionnaire.
Before the questionnaires were distributed to the respondents, they were
informed of the risk in the involvement in the study, their right to withdraw, and
know the results of the study. They were requested to follow the given instructions
and ask for assistance if help is needed.
The questionnaires were retrieved on the same day. The responses of the
respondents were tabulated and analyzed.
3
Sample letter to the president:
Sir:
We, the student researchers of Baguio City National Science High School, are
currently conducting a research entitled “Factors Influencing the Choice of Course
among College Students”. In line with this, we would like to ask permission from your
office to allow the administration of the attached questionnaires to the students of
Saint Palasi University.
Noted:
5
Sample letter to the respondents:
October 14,
2020 Dear
respondent,
We, the student researchers of Baguio City National Science High School, are
currently conducting a research entitled “Factors Influencing the Choice of Course
among College Students”. In line with this, may we request you to accomplish the
attached questionnaire. Rest assured your answer in the questionnaire will be kept
This is the part of your research paper where all the statistical tools used
have to be presented including how these were used and their formulas. Before
you decide on what statistical tool are you going to use, you have to consider the
type of variable, the distribution of data, and the research design.
For the distribution of data, let us discuss the parametric test and non-
parametric
test.
7
there are statistical tests that can be used like the Kolmogorov-Smirnov Test, the
Anderson- Darling test and the Shapiro-Wilk test.
Non-parametric tests are called “distribution free” tests because they are based
on fewer assumptions (example: They do not assume that the outcome is
approximately normally distributed).
For the research design and the type of variable, they were discussed in
Module 1 and Module 3, respectively.Let us discuss the type of research question
with the appropriate statistical tools to be used.
1. Descriptive Question – typically asks the question, “What is…?” with the
underlying purpose to describe the significance of a situation,
state, or existence of a specific phenomenon. This often associated
with revealing hidden or understudied issues.
- The commonly used statistical tool is mean. The formula is
Mean
x
n
Where:
is summation symbol
x is the responses of the respondents
n is the number of respondents
The statistical tools that you may use for comparative question are
presented on the table below.
8
Paired-sample t -test Wilcoxon Matched Pair Signed
- Rank
k-sample ANOVA Kruskal-Wallis H Test
(using F -test)
9
In this module, we will be focusing on the parametric tests. The discussion
of each test is presented below.
a.1. One-sample t-test - this is used to test a null hypothesis that the population
mean is equal to a specified value.
The formula is
xμ
t
s
n
1 73 -12.15 147.6225
2 78 -7.15 51.1225
3 82 -3.15 9.9225
4 67 -18.15 329.4225
5 90 4.85 23.5225
6 85 -0.15 0.0225
7 88 2.85 8.1225
8 92 6.85 46.9225
9 98 12.85 165.1225
10 64 -21.15 447.3225
11 74 -11.15 124.3225
12 90 4.85 23.5225
13 97 11.85 140.4225
14 83 -2.15 4.6225
15 79 -6.15 37.8225
16 83 -2.15 4.6225
17 86 0.85 0.7225
10
18 95 9.85 97.0225
19 99 13.85 191.8225
20 100 14.85 220.5225
x x
2
x 1,703 2,074.55
Computation of Mean:
x
x 1703 85.15
n 20
x x
Sample Computation for the fourth column:
x x 2
12.15 147.6225
x x
2
Where: is the value computed in the table above
n is the number of sample
2074.5500
s 20 1 Substitution.
2074.5500
=
19 Subtract the denominator.
= 109.1868
Divide.
=10.4493 Get the square root.
11
Formula
t
s
n
12
85.15
t80 Substitution
10.4493
20
5.15
t Subtract the numerator and get the square root of 20
10.4493
4.4721
Divide the denominator.
5.15
t 2.3366 Divide.
t 2.2041
This is used to compare means of two sets of data taken from different
group of respondents.
Examples:
1. Comparing the pre-test scores taken from two sections of grade 12
students of Baguio City National Science High School.
2. Comparing before and after training test scores of two different groups
x1 x2
t .
2 1 1
s
n1 n2
Before you can use the formula above, the value of pooled sample variance
2
(s )
13
,should be computed first. The formula for pooled sample variance is
2
x x
2
s
2
1 2 x2
1
x n 1
n2 2
14
Where: is summation symbol
x1 refers to the data of the first group
x2 refers to the data of the second group
x1 is the mean of the first group
x2 is the mean of the second group
n1 is the number of the first group
n2 is the number of the second group
( x1 ) ( x2 )
1 3 20 - 4.20 17.64 1 1
2 3 13 - 4.20 17.64 -6 36
3 3 13 - 4.20 17.64 -6 36
4 12 20 4.80 23.04 1 1
5 15 29 7.80 60.84 10 100
Total 36 95 136.80 174
n2 2
x
s
2
1
x1
2
2
2x2
x n 1
15
Formula
16
2 136.80
s
174 Substitute the corresponding values found in
552
s 38.85
2
the table. Simplify.
38. 85 1 1
5 5
11.80
t 1
Convert to decimal then add.
5
38.850.40
11.80
t Multiply 38.85 by 0.40.
15.54
11.80
t 3.94 Get the square root of 15.54.
t 2.99
Divide.
Since the computed t - value which is 2.99 is greater than the tabular t -
valuewhich is 2.306, reject the null hypothesis. This shows that there is a
significant difference between the mean of the pre-test scores of male and
female He 11 students of BCNSHS.
- This is used to compare means of two sets of data taken from the same
group of respondents.
Examples:
1. Comparing before and after training test scoresof the same group of
students.
2. The difference of two blood pressure measurements of the same group
of senior citizen using different equipment.
17
3. Comparing the pre-test and post-test scores taken from the same
section of grade 12.
18
x y
n
t x y 2
x y
2
n
n 1n
Where: n is the number of respondents
is summation symbol
x is the group of first data
y is the group of second data
Note: Get the absolute value of the computed t - value.
11.8 11.8
7.26
52.8 2.64
20 19
Get the absolute value of the computed t -value. Therefore, the computed t -
value
is 7.26.
Now, look for the tabular t - value on the t -distribution table on page 30
using 5%
level of significance for a two-tailed test and degrees of freedom of (n 1) . You
will see that the tabular t -value is 2.776.
a.4.1. One-way ANOVA – is used when you are going to test one independent
variable with two or more groups.
Examples:
1. Comparing the pre-test scores of the learners from three sections of
STEM 11 of BCNSHS.
2. Comparing before and after training test scores of three different
groups of students.
x x
2
MSSW
nk
MSSB x xG
MSSB
F
MSSW
df B k 1
dfW n k
Where:
MSSW is the within groups mean sum of squares
MSSB is the between groups mean sum of squares
x refers to the raw data
x is the mean of each group
xG is the mean of the means or the grand mean
df is the degrees of freedom
20
k is the number of groups
n is the number of respondents
21
F is the variance ratio
Example:
A teacher from Regional Science High School wants to assess the 3
sections of STEM 11 from BCNSHS based on their General Mathematics pre-test
scores.
Null Hypothesis:There is no significant difference between the scores of the STEM 11 from
BCNSHS when they are grouped according to section.
Compute the mean score of each group. The mean score of STEM A was
used to complete the 4th column, the mean score of STEM B was used to
complete the 5th column, and the mean score of STEM C was used to complete
the 6th column.
5 6 3 5 4 6 5 4 5 5 6 7 6 67
x 2 13
13
5.1538
Sample computation for the 5th column
(x2 x2 )2 5 5.1538 (0.1538)2 0.0237
22
Mean score of STEM C
1 3 4 3 11 2 6 5 4 3 4 5 42
x 3 13
13
3.2308
Sample computation for the 6th column
(x 3 x 3)2 1 3.2308 (2.2308)2 4.9765
Let us identify or solve the needed information in computing the variance
ratio then solve for the variance ratio.
Given:
* n 36
*k3
* df B k 1 3 1 2
* dfW n k 36 3 33
* xG
x 7 7 6 9 8 7 6 7 8 9 5 ... 3 4 5 183 5.0833
n 10 13 13 36
x 10.4000 13.6923
*
MSSW
x
2
32.3077
36 3
56.4000
33 1.7091
nk
MSSB
nx x G
10(7.40 5.0833)2 13(5.1538 5.0833)2 13(3.2308 5.0833)2
3 1
2
k 1
10(5.3671) 13(0.0050) 13(3.4218) 53.6710 0.0650 44.6134
2 2
98.3494
49.1747
2
Solution:
F 49.1747
MSS B 28.7723
MSSW 1.7091
The computed F -value is 28.7723. For the tabular F -value, you will find it at
the
F -distribution table on page 31 with 5% level of significance, the numerator
degrees of
freedom of df B 3 1 2 , and the denominator degrees of freedomof dfW 36 3
33
You can see that the tabular F -value is 3.285.
.
23
a.4.2. Two-way ANOVA – compares multiple groups of two factors. It considers
the effect of two factors and the effect of two categorical factors on each other.
It meets the three principles of design of experiments which are replication,
randomization, and local control. In this case, there are 3 null hypotheses that
are being tested.
24
Example: A teacher wants to assess her students based on their scores in General
Mathematics considering their age and sex. The data were presented below.
Research Questions:
1. Is there a significant difference between the scores of students when they are grouped
according to their age group and sex?
2. Is there an interaction between age group and sex?
Null Hypotheses:
1. There is no significant difference between the scores of students when they are
grouped according to sex.
2. There is no significant difference between the scores of students when they are
grouped according to age.
3. There is no interaction (relationship) between age and sex.
4 6 8 6 6 9 8 9 13 69
xM 9 9
7.6667
25
4 8 9 7 10 13 12 14 16 93
xF 9 9 10.3333
The formula
is SSg
n
x n
2
xG
x G
x
1 M 2 F
Where:
n1 is the number of males
is the number of females
xM is the mean score of
males xF is the mean score
of female xG is the grand
mean
= 15.9992 15.9992
= 31.9984
The formula
is SSa n (x x ) n
2
x x 2
n x x 2
a a G b b G c c G
Where:
na is the number of 15 years old
students nb is the number of 16 years
old students nc is the number of 17
years old students
xa is the mean score of 15 years old
students xb is the mean score of 16
years old students xc is the mean score
26
of 17 years old students
27
6 6 9 7 10 13 51
xb 8.5
6 6
8 9 13 12 14 16 72
xc 6
6
12
Solution:
SSa 6(6.5 9) 2 68.5 9 612 9
2 2
= 37.5 1.5 54 93
The formula
x x
2
is SS
w
Solution:
1 1 4 6 4 62 4
1 1 6 6 62 0
1 1 8 8 62 4
1 2 6 7 6 72 1
1 2 6 6 72 1
1 2 9 9 72 4
1 3 8 10 8 102 4
1 3 9 9 102 1
1 3 13 13 102 9
2 1 4 7 4 72 9
2 1 8 8 72 1
2 1 9 9 72 4
2 2 7 10 7 102 9
2 2 10 10 102 0
2 2 13 13 102 9
2 3 12 14 12 142 4
2 3 14 14 142 0
2 3 16 16 142 4
x x
2
68
28
x x
2
Thus, SS 68
w
First, compute the mean square for gender using the formula:
MSg = SS
g
df
g
Since there are 2 identified sex which are male and female then n 2 .
31.9984
Solution: MS 31.9984
SSg g g 1
df
Second, compute the mean square for sum of squares within (error). The
formula is
SSw
MS w
df
w
Since there are 3 males that are 15 years old, 3 males that are 16 years old,
3 males that are 17 years old, 3 females that are 15 years old, 3 females that
are 16 years old, 3
females that are 17 years old, the degrees of freedom is 12 using the formula
n 1 . Subtract 1 from each group considering the age and sex. Thus,
68
MSw 5.6667
12
g
MSg
F MS
Wh ere:
29
w
MSw
MSg is the mean square for sex
is the mean square for within
MSg 31.9984
F 5.6467
g
MS w 5.6667
30
Solution:
First, compute the mean square for age using the formula:
MSa = SSa
df
a
Solution: MSa 93
=
46.5
2
Last, compute the F -value for age using the formula,
MS
Fa MS a
w
Where:
MSa is the mean square for
MSw age is the mean square
for within
MSa
Solution: F 46.5 8.2058
a
MS w 5.6667
32
significant difference between the scores of students when they are grouped
according to age.
c. For the hypothesis, There is no interaction(relationship) between age and sex, we
have several steps to do.
First, compute the total sum of squares by deducting the grand
mean from each score then square the difference. That is,
SSt
x x 4 9 6 9 8 9 6 9 6 9 9 9 8 9
2 2 2 2 2 2 2 2
9 92 13 92 4 92 8 92 9 92 7 92 10 92 13 92
12 92 14 92 16 92
= 25 9 1 9 9 0 1 0 16 25 1 0 4 116 9 25 49
= 200
Second, compute the sum of squares for both factors (age and
gender). The formula is,
SS SS SS SS SS
b t g a w
Where:
SSb is the sum of squares for both factors
SSt is the total sum of
squares SSa is the sum of
squares for age SSw is the
sum of squares within
SSg is the sum of squares for gender
Third, compute the mean square for both factors.The formula is:
MSb = SSb
df
b
Where:
SSb is the sum of squares for both factors
dfb is the degrees of freedom for both factors
MSb is the mean square for both factors
33
7 2
3.5
Last, compute the F -value for both factors using the formula,
34
MSb
F
b MS w
35
MSb 3.5
F 0.6176
b
MS 5.6667
The computed Fw-value is 0.6176.
SSw 68 12 5.6667
SSt 200 17
36
F -Distribution Table, ALPHA = 0.05
37
38